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Abstract 


How many shuffles are needed to mix up a deck of cards? This question may 
be answered in the language of a random walk on the symmetric group, 5*52• 
This generalises neatly to the study of random walks on finite groups — 
themselves a special class of Markov chains. Ergodic random walks exhibit 
nice limiting behaviour, and both the quantitative and qualitative aspects of 
the convergence to this limiting behaviour is examined. A particular qualita¬ 
tive behaviour — the cut-off phenomenon — occurs in many examples. For 
random walks exhibiting this behaviour, after a period of time, convergence 
to the limiting behaviour is abrupt. 

The aim of this thesis is to present the general theory of random walks 
on finite groups, with a particular emphasis on the cut-off phenomenon. It 
is an open problem to determine which random walks exhibit the cut-off 
phenomenon. There are various formulations of the cut-off phenomenon; 
the original — that of variation distance cut-off — is considered here. At 
present, progress is made on this problem in a case-by-case basis. There are 
general techniques for attacking a particular case — and many of these are 
presented here — but there are no truly universal results. 

Throughout the thesis, examples are used to demonstrate the theory. 
The last chapter presents some new heuristics developed by the author in 
the course of his studies. 
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Chapter 1 

Introduction 


The question, how many shuffles are required to mix up a deck of cards , 
does not appear to have an obvious mathematical answer. Before any kind 
of analysis can be done, the terms deck of cards, shuffle and mixed up need 
a precise mathematical realisation. 

Consider a fresh deck of cards; in the order, K’fl, QC,..., A’fl, ■ ■ ■, AJI». 
In this order, each card can be labeled 1,..., 52, and given any arrangement 
of the deck, a permutation a : {!,..., 52} —>■ {1,...,52} can encode the 
arrangement: 

1 2 • • • 52 

o-(l) a{2) ••• o-(52) 

In the language of group theory, the deck of cards may be modelled by S' 52 . 
A shuffle, meanwhile, takes the deck, and, independentl'^ of the arrangement 
of the deck, permutes the cards. For example, a perfect cut shuffle takes off 
the top half of the cards and places it under the bottom half of the deck is a 
shuffle. It is not hard to see that a shuffle is a function S : S 52 —^ S 52 , whose 
action is by multiplication by some as € S 52 ] i-e. S{a) = asa. Indeed the 
perfect-cut shuffle is realised by multiplication by (1, 27)(2, 28) • • • (26, 52). 
Now the question of when is a deck mixed-up needs to addressed. In the first 
instance, it is always assumed that the deck started in some known order; 
e.g. the one given above. Secondly, when is a deck totally random? 



^in general (!), one doesn’t shuffle while looking at the labels on the cards. To be 
technical, not all functions S : S52 S52 are considered shuffles. For example, the 

‘shuffle’ swapping the positions of A'v* and is not a shuffle. 
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If one is handed a deck of cards, face down, and if each possible order of 
the cards is equally possible then the deck is considered random. It should 
be clear from group theory, that if any perfect shuffle is repeated, then the 
deck will never get random in this sense. If the deck is always shuffled by 
as, then after k shuffles the deck will be in the order (t|. Hence, to get 
random, there has to be some randomisation in how the deck is shuffled. As 
an example of a suitable randomisation, pick two distinct cards at randomU, 
and let the shuffle swap the positions of these two cards. Assuming now that 
after a number of shuffles, every arrangement of the deck is approximately 
equally likely, various notions of ‘how close’ the deck is to random may be 
formulated, and a clear dehnition of mixed-up may be given. 

Consider the riffle shuffle: at each step the deck is cut into two packs 
which are then riffled together. A model for such shuffles on n rather than 
just 52 cards, due to Gilbert, Shannon and (independently) Reeds, was 
completely analysed in a remarkable paper by Bayer Sz Diaconis [6] . In this 
paper, a phenomenon called the cut-off phenomenon was proven to occur 
for the riffle shuffle. Namely, for n large, the deck is far from random in a 
certain sense after less than = (31og2n.)/2 shuffles, but close to random 
after more than tn shuffles: the transition from order to random takes place 
at about tn steps and it makes sense to say it takes tn steps to mix-up the 
cards. For the case n = 52, seven shuffles are necessary and sufficient to mix 
up the cards. 

Random walks on finite groups generalise card shuffling by replacing the 
symmetric group by any hnite group. This thesis aims to present the general 
theory of random walks on hnite groups, with an emphasis on the cut-off 
phenomenon. In particular, care has been shown to take no liberties with 
assumptions, and all the ‘obvious’ elements of the theory are revisited and 
questioned. For example, Theorem ll.3.2l is standard in the held but almost 
all references do not carry the non-trivial proof. The questioning of ‘obvious’ 
facets of the theory allowed some new perspectives. 


^to be careful maybe two distinct card positions, e.g. top card, second card, etc. 
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In making the thesis modest, some interesting and often powerful aspects 
of the theory have been omitted. The aforementioned riffle shuffle was not 
studied — neither was the familiar over-hand shuffle. In fact, in terms of 
the development of the subject, the riffle shuffle is a pathological example. 
Despite its apparent complexity, the shuffle has been more or less completely 
understood and analysed by Bayer &: Diaconis, albeit through some deeper 
mathematics than the subject usually requires. 

The Diaconis-Fourier theory is an attractive machinery in the held that is 
presented here. However it is only applied in two Abelian examples: neither 
of which needed require the full theory anyway. Its greatest success has been 
in the analysis of the random transposition shuffle, a random walk on the 
symmetric group, however the representation theory of the symmetric group 
is not covered here. Diaconis m is an excellent reference. A great survey 
of techniques, including those not mentioned here is |27j . 

There are a number of interesting generalisations of random walks on groups, 
such as to homogenous spaces and Gelfand pairs. These are not covered here: 
Ceccherini-Silberstein et al [7] is an excellent book and pursues these areas. 

Despite these restrictions, a great variety of mathematical techniques are 
used. Probability, measure theory, representation theory, functional anal¬ 
ysis, geometry and, naturally, group theory is used throughout the thesis. 
The cut-off phenomenon is not just a theory for random walks on groups, it 
occurs for some more general Markov chains also. A breakthrough in the the¬ 
ory of random walks on groups will surely have an impact for the Markov 
chain community. In his introduction, Chen [8] discusses a few examples 
where the existence of a cut-off has a signihcant impact for applications. 

This first chapter introduces the general discrete time Markov chain the¬ 
ory on a finite set. Random walks on groups are introduced as a special class 
of Markov chains and necessary and sufficient conditions for a random walk 
to ‘get random’ are developed. 

Chapter 2 discusses what it means for a random walk to be ‘close to ran¬ 
dom’. A number of measures of closeness to random are introduced. A 
distinguished distance, namely the variation distance, is identified as the 
conventional measure of closeness to random in this study. An interpreta¬ 
tion of variation distance by Switzer is shown to be correct here. Much of 
the spectral analysis of the stochastic operator is done in this chapter and 
this yields upper bounds on the distance to random — many related to the 
eigenvalues of the associated stochastic operator. Next techniques for finding 
lower bounds on the distance to random are discussed. Finally, methods of 
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procuring bounds for these eigenvalues via the geometry of the group are 
presented. 

Chapter 3 develops the representation theory of finite groups. In conjunc¬ 
tion with Fourier analysis for finite groups, this machinery, so well pioneered 
by Diaconis, is a powerful technique for generating bounds on the distance 
to random. Here the full, general, theory is developed. Two Abelian exam¬ 
ples, the simple walk on the circle and the simple walk with loops on the 
n-Cube, are analysed. 

Chapter 4 introduces the cut-off phenomenon and its formulation. In partic¬ 
ular, it is seen that the phenomenon is defined with respect to the limiting 
behavior of a family of random walks on groups, {Gn : n G N}, as the size of 
the group increases to infinity {n —)• oo). There is a discussion of the present 
understanding of the cut-off phenomenon, and reasons for its existence are 
mentioned. 

Chapter 5 presents some probabilistic methods for bounding the distance to 
random. These powerful methods — strong uniform times and coupling — 
are occasionally very transparent and help explain why cut-offs occur. 
Finally in Chapter 6 some new viewpoints and generalisations are presented. 
Although the motion of a particle in a random walk is random (in general, 
after k steps the position of the particle is unknown), its distribution after 
k steps is deterministic. Thus the random walk has the structure of a dy¬ 
namical system. Here an attempt is made to develop this further. Also the 
question of whether or not the invertibility of the stochastic operator has 
implications for a random walk is addressed. A study of invertible stochastic 
operators is, as far as this author knows, non-existent in the literature. A 
few basic properties and questions are explored. Finally, a conjecture of the 
author, namely that if the stochastic operator is invertible, then the cut-off 
phenomenon will not be exhibited, is explored and disproved. 
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1.1 Markov Chain Theory 


Essentially, a Markov Chain is a construction of a mathematical model for a 
certain type of discrete motion of a particle in a space. The particle begins 
at some initial point and at certain times ■ ■ ■ moves to another point 
in the space chosen ‘at random’. The probability that the particle moves to 
a certain point y at a time t is dependent only upon its position x at the 
previous time. This is the Markov property. 

To formulate, let X be a finite set. Denote by Mp{X) the probability 
measures on X. Let 6^ be the element of Mp{X) which puts a measure of 
1 on X (and zero elsewhere). These Dirac measures, {5^ : x G X}, are the 
canonical basis for D Mp{X). A probability measure u G Mp{X) is 
strict if iy{x) > 0, for all x G X. Denote by F{X) the complex functions on 
X and L(y) the linear operators on a vector space V. The similarly defined 
Dirac functions, {5x '■ x G X}, are the canonical basis for F{X). With 
respect to this basis P G L{F{X)) has a matrix representation \p{x,y)\xy 
P G L{F{X)) is a stochastic operator if: 

(i) p{x,y) > 0, Vx,y 

(ii) (row sum is unity) 

Given u G Mp{X), a stochastic operators P acts on n as vP{x) = 
Yfy x). Stochastic operators are readily characterised without using 

matrix elements as being Mp(X)-stable in the sense that Mp{X)P C Mp{X) 
if and only if P is a stochastic operator. It is an immediate consequence 
that if P and Q are stochastic, then so is PQ. 
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1.1.1 Definition 

Let X be a finite set and v G Mp{X), P a stochastic operator on X, 
and {Y,A,fi) a probability space. A sequence {^fc}fc=o random variables 
; y —)■ A are a Markov Chain with initial distribution v and stochastic 
operator P, if 

(i) = xo) = i^(xo). 

(ii) X/^^i I ^0 Xq , Ak Xyd) , 

assuming = xq, ... Ak = Xk) > 0 . 


If 1 / = y in (i) the Markov chain is said to start deterministically at x. 
Condition (ii) is the Markov property. Subsequent references to a Markov 
Chain ^ refer to a Markov Chain {{^k}k= 0 ’^x)- 
In terms of existence, given n and P, let 

Y := = A X A X • • • X A 

'-V-' 

n+1 copies 

Define 4 : A A by &(xo, ...,Xn) = Xk and 

H{xo, ...,Xn) = n{xo)p{xo,Xi) ■ ■■p{Xn-l,Xm)- 
Then p, G Mp{Y), and is a Markov Chain for n and P. 

1.1.2 Example: Two State Markov Chain 

Consider the set A = {1,2} and n G Mp{X). Suppose the probability of 
going from 1 to 2 is p and the probability of going from 2 to 1 is ( 7 . Then 
the two state Markov chain has stochastic operator 



for p,q £ [ 0 , 1 ]. 
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Figure 1.1: A graphical representation of the two state Markov chain. 

1.2 Ergodic Theory 

Ergodic theory is concerned with the longtime behaviour of a Markov chain. 
A central question is for a given chain whether or not the display limiting 
behaviour as k ^ oo? If ‘^oo’ exists, what is its distribution? 

One possible debarring of the existence of a limit is periodicity. Consider 
a Markov chain ^ on a set X = Xq U Xi with Xq n Xi = 0 and neither of 
the Aj = 0 for i = 1,2. Suppose ^ has the property that ^ 2 k+i £ A"*, for 
k G No, i = 0,1. Then ‘^oo’ cannot exist in the obvious way. In a certain 
sense ^ must be aperiodic for limiting behaviour to exist. 

Suppose ^ is a Markov chain and the limit —)• 6 exists. Loosely 
speaking, after a long time A, has distribution h{^n) ~ &■ 

~ e 

uP^P ~ 9P 
=> uP^+^ ~ 9P 

But 9 also and hence 9P ~ 9. So if ‘.^oo’ exists then its distri¬ 

bution 9 may have the property 9P = 9. Such a distribution is said to 
be a stationary distribution for P. Relaxing the supposition on ‘Coo’ exist¬ 
ing, do stationary distributions exist? Clearly they are left eigenvectors of 
eigenvalue 1 that have positive entries summing to 1. 
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If k{x) € F{X) is any constant function then Pk = /c so /c is a right 
eigenfunction of eigenvalue 1. Let tt be a left eigenvector of eigenvalue 1. By 
the triangle inequality, |u(x)| = | Y.y'^iy)piy^ < Y.y Wiy)\p{y^x). Now 


E 

\u{z)\ < y 

[ \u{y)\piy,z)] 

II 

M 


) = \u{y)\ 

z&X 

z£X 

\y€X ) 

y&X ' 

S,, 

\z£X 

} y€X 

/ 


=1 


Hence the inequality is an equality so \^y Wiy)\piy^ ~ l^(^)l j = 0 is 
a sum of non-negative terms. Hence |n|P = |ul, and by a scaling, 7r(x) := 
\u{x)\/ Yhy W{y)\j is a stationary distribution. 

How many stationary distributions exist? Consider Markov Chains ^ 
and C on disjoint finite sets X and Y, with stochastic operators P and Q. 
The block matrix 

^=(o q) 

is a stochastic operator on X U T. If vr and 9 are stationary distributions 
for P and Q then 

(jic = (CTT, (1 - c)6) , c G [0, 1] 

is an infinite family of stationary distributions for R. The dynamics of this 
walk are that if the particle is in X it stays in X, and vice versa for Y (the 
graph of R has two disconnected components). This example shows that, 
in general, the stationary distribution need not be unique. Rosenthal [26] 
shows that a sufficient condition for uniqueness is that the Markov chain ^ 
has the property that every point is accessible from any other point; i.e. for 
all x,y ^ X, there exists r{x,y) € N such that > 0. A Markov 

chain satisfying this property is said to be irreducible. 

So for the existence of a unique, stationary distribution it may be suf¬ 
ficient that the Markov chain is both aperiodic and irreducible. Call a 
stochastic operator P ergodic if there exists ng G N such that 

(x, y) > 0, Vx, y G X 

In fact, ergodicity is equivalent to aperiodic and irreducible (see |26jl Lemma 
8.3.9), and the following theorem asserts that it is both a necessary and 

^although aperiodic hasn’t been defined here 


13 






sufficient condition for the existence of a strict distribution for ‘^oo’- These 
precluding remarks suggest the distribution of is in fact stationary and 
unique, and indeed this will be seen to be the case. A nice, non-standard 
proof of this well-known theorem is to be found in [7]. 

1.2.1 Markov Ergodic Theorem 

A stochastic operator P is ergodic if and only if there exists a strict vr E 
Mp{X) such that 

lim p^'^\x,y) = 7r(y), Vx, y E X (1.3) 

n—)-cx) 

In this case vr is the unique stationary distribution for P • 

In the special class of ergodic Markov chains, m indicates that statis¬ 
tically speaking, the system that evolves for a long time ‘forgets’ its initial 
state. Another special class of Markov chains are reversible Markov chains. 

A stochastic operator P is reversible if there exists a strict tt E Mp{X) such 
that 

7 :{x)p{x,y) = p{y,x)TT{y), Vx,yEA (1.4) 

This is equivalent to D^^P = where is the diagonal matrix with 

(x, x)-component 7r(x). Suppose further that P is ergodic and (11.41) holds 
for some strict vr E Mp(G). A quick calculation shows that then vr is the 
unique, strict, stationary distribution. The definition of a reversible chain 
appears at odds with our interpretation of what reversible means. However, 
it may be shown (see m) that the condition is equivalent to 

(i) p(x,y) > 0 ^p{y,x) > 0 

(ii) for all n E N , xq, xi,..., x^ E X, 

p(xo,Xi)p(xi,X2) • ■■p{Xn-l,Xn)p{Xn,Xo) = p{xo, Xn)piXn, Xn-l) ' • •p(xi,Xo) 
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Figure 1.2: For a reversible Markov Chain, the probability of going in a 
cycle from 0 —)• 0 is equal for clockwise and anti-clockwise orientations. 


1.3 Random Walks on Finite Groups 

1.3.1 Introduction 

A particularly nice class of Markov chain is that of a random walk on a 
group. The particle moves from group element to group element by choosing 
an element h of the group ‘at random’ and moving to the product of h 
and the present position g, i.e. the particle moves from g to hg. To avoid 
trivialities, the random walk on the trivial group is not considered. Naturally 
the group structure of the walk induces strong symmetry conditions: this 
allows the generation of much stronger results than that of general Markov 
chain theory. 


To formulate, let G be a finite group of order |G| and identity e. Let 
y E Mp{G) and (T,/i) be a probability space. Let {Cfe}fc=o ■ /^) —^ G be 
a sequence of i.i.d. random variables with distributions /i(Co = <7o) = ^^ido) 
and fi{Ck = g) = ^(<?)- The sequence of random variables {^k}k=o ' 

G 


= CfcCfc-1 • • • CiCo 


(1.5) 


is a right-invariant random walk on G. 
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This construction makes ^ into a Markov Chain on G with initial dis¬ 
tribution (5® and stochastic operator P = p{s,t) is induced by the driving 
probability, u: p{s,t) = The random walk is called right invariant 

because p{s,t) = p{sh,th). This is obvious as 

p{sh,th) = v{th{sh)~^) = i'{ts~^) = p{s,t) 

Example: Card Shuffling 

Card shuffling provides the motivation for the study of random walks on 
groups and remains the canonical example. Everyday shuffles such as the 
overhand shuffle or the riffle shuffle, as well as simpler but more tractable 
examples such as top-to-random or random transpositions all have the struc¬ 
ture of a random walk on S 52 ■ Each shuffle may be realised as sampling from 
a probability distribution v G Mp{S^ 2 )- For example, consider the case of 
repeated random transpositions. A random transposition consists chooses 
two cards at random (with replacement) from the deck and swapping the 
positions of these two cards. Suppose without loss of generality that the 
hrst card chosen is the ace of spades. The probability of choosing the ace of 
spaces again is 1/52. Swapping the ace the spades with itself leaves the deck 
unchanged. The choice of the first card is independent hence the probability 
that the shuffle leaves the deck unchanged is 1/52. What is the probability 
of transposing two given (distinct) cards? Consider, again without loss of 
generality, the probability of transposing the ace of spades and the ace of 
hearts. There are two ways this may be achieved; choose or choose 

Both of these have probability of 1/52^. Any other given shuffle 
(not leaving the deck unchanged or transposing two cards) is impossible. 
Hence the shuffle may be modelled as sampling by 

( 1/52 ifs = e 

v{s) ;= < if s is a transposition 

I 0 otherwise 
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It is a straightforward calculation to show that the stochastic operator 
of a random walk on a group is doubly stochastic — column sums are also 1. 
As a corollary, the uniform distribution, 7r(g) = 1/|G|, is a strict, stationary 
distribution. To keep terminology to a minimum, the uniform distribution 
shall be referred to as the random distribution and conversely tt will refer to 
this random distribution. 

If S = supp {u), then, in general, however if (S) = G and e E S 

then certainly C for any k < 1. Indeed: 

{e} = C S C c • • • C = G 

where T is called the cover time of the walk. In this case P is ergodic with 
no = T. From Section Ol it is known that ‘^oo’ exists in a nice way if 
the stochastic operator P is ergodic. Conveniently, this condition may be 
translated into a condition on the driving probability on the group, n. The 
below theorem falls under the category of a ‘folklore theorem’ in that almost 
all references refer to the proof in older hard-to-source references — if at all. 
A proof outline is given by Fountoulakis m in his lecture notes but here a 
full proof is given. 

1.3.2 Ergodic Theorem for Random Walks on Groups 

Let G be a group and v E Mp{G) with support S. A right-invariant random 
walk on G is ergodic if and only ifE^K for any proper subgroup K of G 
and E (fi Hx for any coset of any proper normal subgroup P[ <\G. 

In this case, tt is the unique, strict stationary distribution for P. 

Proof. Assume E C K a proper subgroup of G. (E) C AT by closure in 
K] hence (^k £ K, for all A: E N. Let, s & K, t ^ K. Now for all n E N, 
= 0. Hence P is not ergodic. 

Assume E <Z Hx for some coset of a proper normal subgroup H <\G. Now 
^0 £ He and E HxHe = Hx, so by induction E (Hx)"' = Hx"^, for all 
n E N. Let n E N. Let s E G\Hx'^: p^'^\e, s) = 0. Hence P is not ergodic. 

Assume now E (fi K a proper subgroup of G and E (fi Hx for any coset 
of any proper normal subgroup H <\G. 

Clearly the inclusions E C (S) C G hold with (S) a subgroup of G. By 
assumption E does not lie in a proper subgroup hence (S) = G. Hence for 
all s,t & G, there exists n{s,t) E N such that (s, t) > 0. 

Let Ls(c) •— {(^ii) ■ ■ ■ ) '^*jv) ■ ® ~ > ^im ■ ■ ■ ^in 7 ^ 6 , u m < 

N — l;ai. E S} be the set of all distinct minimal S-presentations of e. 
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Claim 1: If |Ls(e)| = 1, G = and S is in a coset of a proper 
normal subgroup. 

Proof. If |Ls(e)| = 1 there is only one minimal S-presentations of e. But 
and a are two distinct minimal S-presentations of e. Hence ui = 
(72. Hence S = {(t}. But (S) = G, hence G is cyclic and in particular 
S C {e}a the coset of the proper normal subgroup {e} • 

Claim 2: Assume \Lj](e)\ >1. If S is not contained in a coset of a 
proper normal subgroup of G, then, where L is the set of word lengths of 
the elements of L^{e), gcdL = 1. 

Proof. Suppose gcdL = k > 1. Then every S-presentation of e has length 
0 mod k. Let Nk C G he the subgroup generated by all elements of G with 
a length 0 mod k S-presentation. Clearly e £ N^. Let t £ G. Suppose t 
has a length p mod k S-presentation. Then has a length —p mod k S- 
presentation since t~^t = e has length 0 mod k S-presentation. Let n £ N/^. 
By definition, n has a length 0 mod k S-presentation and so t~^nt has a 
length 0 mod k S-presentation. So Nj. is normal. 

Let a £ T, and suppose a £ Ni~. Then 

(7(7 = e = ((7i^ • • • 

that is e would have a length —1 mod k S-presentation, which is not allowed. 
Hence a 0 iV^, so is a proper normal subgroup of G. 

Let (7i £ S. Then for all (7 G S as S-presentations of any a~^ 

have length —1 mod k. Hence S C N^ai and this contradicts the assumption 
on S. Hence gcdLs(e) = 1 • 

Let S be the set of lengths of al0 distinct S-presentations of e. As L C S, 
gcd5* = 1. Hence there exist li,... ,lm £ S, ki £ Ij such that [22] : 



k\l\ T * * * T k^nlm — 1 

(1.6) 

Let 1 £ S and 

n(e, s) as above. 


Let 

M = li\ki\ lm\km\ 

(1.7) 

and 

no(e, s) = IM + n(e, s) 

(1.8) 

If n > no(e, s) 

, and letting 



r = 


n — n(e, s) 

I 


, and 


n = n(e, s) + + o 


^not just minimal presentations 
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where 0 < a < I and r > M. Now as 

m m 

'^kik = 1 , and '^li\ki\ = M, 

i=\ i=l 


n may be written 


n = n{e,s) + rl-lM + l E ^21^2! I ~\~Oj E hk 


V2=l 


^ 2 = 1 


=0 


(r - M)l + '^{l\ki\ + aki)li + n(e, s) 


i=l 


where the {l\ki\ + aki) > 0. Let x,y,X G N. Note that the probability of 
going from s to t in x + Ay steps is certainly greater than going from s to t 
in X steps and returning to t every y steps A times: 

p(x+Xy)^S,t) > p^^\s,t) {p^y\t,t)y ( 1 . 9 ) 

Hence as l,li ^ S (so that e) > 0) and p("'(®’^))(e, s) > 0; 

m 

.*=1 

Now let no be the maximum of no(e, s) as s runs over G. Let s,t G G. By 
right invariance 

p^^\s,t) = p^'^\e,ts~^) > 0 , for n > no 
Hence P is ergodic • 



l\ki\+aki 
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Chapter 2 

Distance to Random 


2.1 Introduction 

The previous chapter demonstrates that under mild conditions a random 
walk on a group converges to the random distribution. Therefore, initially 
the walk is ‘far’ from random and eventually the walk is ‘close’ to random. 
An appropriate question therefore, is given a control e > 0, how large should 
k be so that the walk is e-close to random after k steps? The first problem 
here is to have a measure of ‘close to random’. This chapter introduces a few 
measures of ‘closeness to random’, discusses the relationship between them 
and presents some bounds. In the rest of the work, all walks are assumed 
ergodic unless stated otherwise. 

Let u and fi G Mp{G). The convolution of u and fi is the probability 

V -k fi{s) := ( 2 . 1 ) 

t£G 

In particular denote := jy * The distribution of a random walk 

after one step is given by n. If s G G, then the walk can go to s in two 
steps by going to some t G G after one step and going from there to s in 
the next. The probability of going from t to s is given by the probability of 
choosing st~^, i.e. By summing over all intermediate steps t G G, 

and noting that z/ * = i/, it is seen that if {Ck}k=o is a random walk on 

G driven by v, then is the probability distribution of In terms of 
the stochastic operator induced hy u G Mp{G) , P, given any fj, G Mp{G), 
fiP = V k 
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2.2 Measures of Randomness 


The preceding remarks indicate that —>■ vr thus a measure of closeness 

to random can be defined by defining a metric on Mp{G) or putting a norm 
on 5 Mp(G). Then a precise mathematical question may be asked: 
given e > 0, how large should k be so that Wu*^ — 7r|| < e or d{u*^,Tr) < e? 
Straightaway it is clear that any of the p-norms may be used. Also multiples 
of p-norms may be used, for example, Diaconis & Saloff-Coste m introduce 
the distance dp{k) := \G\^~~^/p\\v*^ — 7r||p. 

Another notion of closeness to random, although not a metric, is that of 
separation distance: 

Clearly s{k) G [0,1] with s{k) = 1 if and only if v*^{g) = 0 for some g; and 
s{k) = 0 if and only if = vr. The separation distance is submultiplicative 
in the sense that s{k + /) < s{k)s{l), for k,l £ 'N [1]. This immediately 
implies that s{nk) < [s(A:)]"'. Suppose however that i'*^{g) = 0 for some 
g £ G. Then s{k) = 1 and s{nk) < 1 which is useless. However because the 
walk is ergodic there exists a time no when is supported on the entire 
group. Let L := min{n*”°(s) : s £ G}. Then s(no) = (1 — |G|L), thence 
s{knQ) < (1 — |G|L)*^. An example where this bound is easily applied is the 
simple walk on Z„, n odd, where n(±l) = 1/2. Then no = n — 1, L = 2^“"' 
and thence s{k{n — 1)) < (1 — n.2^“"')*^. 

A further measure of randomness is that of the average Shannon Entropy 
of the distribution; H{pi) = ^(t) log (l/^(t)). A quick calculation shows 

that = 0, H{'k) = log|G|; and also that increases to log|G| 

monotonically m- Therefore cr{k) := log|G| — is a measure of 

closeness to random. A lower bound, adapted from [2], is a{k) > (1 — 
k) log |G| + ka{l). 

The default measure of closeness to random in this work, however, is 
variation distance. If £ Mp{G), their variation distance is 

11^ — z^ll := max |/r(A) — i/(A)| (2-3) 
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Diaconis [12] notes an interpretation of variation distance of Paul Switzer. 
Consider fJ., v & Mp{G). Given a single observation of G, sampled from /r or 
u with probability 1/2, guess whether the observation, o, was sampled from 
/i or u. The classical strategy presented here gives the probability of being 
correct as 1/2(1 + W/j. — z^||): 

1. Evaluate ij.{o) and i'(o). 

2. If/r(o) > i'(o), choose fi. 

3. If ^{o) > choose 

To see this is true, let {/r > zz} be the set {t G G : Suppose o is 

sampled from /r. Then the strategy is correct if o G {fi = u} or o G {fj, > u}: 

P[guessing correctly | /x] = P[o G {fi = i^} \ fj] + P[o G {fi > v} \ fj] 

with a similar expression for P[guessing correctly | v]. Note that P[o G {/x = 
u}] = /x({/x = ix}) = ix({^ = u}) and also P[o G {fi > i^} \ fj] = /r({/x > u}) 
(and similar for o G {/x < zx}). Thus 

P[guessing correctly] = -P[guessing correctly | /x] + -P[guessing correctly | zx] 


i (zx({/x = zx}) + /x({^ > zx})) + i (zx({/x < zx})) 


It is easily shown that 


y - 1^11 = /^ ({/X > zx}). 


Hence 


P [guessing correctly] = - 


- ^W) 

' -V-' 


=1 


Also the separation distance controls the variation distance as 
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It is a straightforward exercise, however, to show that ||^ — i^|| is simply 
half of the usual /^-distance \\fi — i/\\i. Hence, with P doubly stochastic 
(||-P||p_^p = 1 as column sums are 1), the quick calculation 

||^*fc+i _ ^11^ ^ _ 7r||i||P||;i^;i = - 7r||i 

shows that \\v*^ ~ ti'II is decreasing in k. 


At this juncture Aldous [2] denotes by r(e) the time to get e-close to 
random: min{A: : — 7r|| < e}. Call r := r(l/2e) the mixing time. The 

reason the random walk driven by i/ G Mp{G) is defined to start determin¬ 
istically at e is because due to right-invariance a random walk driven by the 
same measure starting deterministically at g ^ e will converge to random 
at the same rate. Also, if is distributed as 9 = then the walk 

looks like where is the walk which begins deterministically at t. 

All these constituent walks converge at the same rate, however, as might be 
expected: 




sGG 


Y.at5^P\ 


s - vr s 


ugG 


■E 

sGG 


^at U^P^is) - 7r(s 


tGG 


s ? E E - ’^(0)1 = E “• (5 E 


S — TTIS 


seGteG 

< -ttII 


teG 


seG 


(2.4) 


Certainly there is equality if 0 is a Dirac measure or the random distribution, 

TT. 


2.3 Spectral Analysis 

In the case of reversible random walks, where vr is the random distribution, 
Tr{g)p{g,h) = p{h, g)'K{h). Hence the driving probability is symmetric: 

pig, h) = p{h, g) vihg~^) = v{gh~^) v{s) = , ^seG 

Also in the {(5t : t G G} basis the matrix representation of the stochastic 
operator is symmetric: p(x,y) = piy,x). Let (|) be the inner product on 
F{Gy. 

('/’IV’) := 

' ' seG 
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When the walk is reversible: 


' ' sGG ViGG / 

^ ^ iGG VsGG / 

and so the stochastic operator is self-adjoint. By the spectral theorem for 
self-adjoint maps P has an (left) eigenbasis B = {ui,..., ri|( 7 |}. Suppose 
further that B is normalised such that = Y2 ^tUt with ui = tt and ai = 1 
(in fact for any 0 E Mp(G) this normalisation is unique. Let v E M". Call the 
sum of the entries of v its weight. The eigenvectors ut, t ^ I, are orthogonal 
to TT. Thence these eigenvectors have weight 0 so in order for the linear 
combination to be a probability distribution the weight needs to be 1 , hence 
ai must be 1.). If P is ergodic, then the eigenvalue 1 has multiplicity 1. 
A quick calculation shows that if Ai = 1, then also |Ai| < 1, for all t 7 ^ 1. 
Using an elegant graph-theoretic argument, Ceccherini-Silberstein et al [7] 
show that if P is ergodic then —1 is not an eigenvalue. Therefore in the case 
of reversible walks (real eigenvalues), \Xt\ < 1, for all t 7 ^ 1 (this is also a 
consequence of the Perron-Frobenius Theorem), and then 

= 7t + Y^ atX’lut ( 2 . 5 ) 


Therefore, letting A* := max{|At| : t 7 ^ 1}; 


, ,*k II ^ 

V — TT = — 

2 

^ atX'lut 

4s 

J2atXtUt{s) 



s£G 



sGG 


'-V-' 

=G 

Hence the rate of convergence is controlled by the second highest eigenvalue 
in magnitude. In Corollary 12.3.31 an explicit C is given. The importance of 
the second largest eigenvalue is a mantra in Markov chain theory, however 
it is only in the reversible case that the importance is so obvious. 
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Suppose now that P is a not-necessarily-reversible stochastic operator. 
Following [29], put P in Jordan normal form: 


/1 

J 2 

0 \ 

u 

Jm / 


where the Jordan blocks J* have form: 

/ Xi 1 0 \ 

_ 0 Aj 

— 

1 

V 0 ••• 0 Xi J 

and have size equal to the algebraic multiplicity of Aj. Note the hrst entry 
of P will be just 1 as 1 is an eigenvalue of multiplicity 1. The Jordan block 
Ji is the sum of the diagonal matrix AjJ and the super diagonal, and thus 
nilpotent, matrix Aij. With P” = diag(l, J”,..., J^), and noting Nf^ = 0 
where di is the multiplicity of A^; 



Now, for j < di, N- is the matrix with ones on the jth diagonal above the 
main diagonal. Hence is a matrix whose lower diagonal entries are zero 
and have equal entries along this ‘jth diagonal’, namely 

Hence the magnitude of the entries along the jth diagonal is bounded by 
(as |Aj| < 1): 


< |A*I 



The remaining manipulations are dependent on the relation of k to di. As¬ 
suming k > 2di for example: 


(Jhi < iAi 
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In Jordan normal form, P converges to the matrix with 1 in the (1,1) entry 
and zero elsewhere. Clearly it is the block corresponding to the second 
largest eigenvalue in magnitude which is the slowest to converge and hence 
this eigenvalue controls convergence. 

Taking the approach of [7], more explicit bounds for the reversible case 
may be found. If the walk is reversible then P has an (right) orthonor¬ 
mal basis B = {vt : t G G} with corresponding eigenvalues {A* : t E G}. 
Let vi be the constant function with value 1 (so that Ai = 1). Put A = 
diag(Ai,...,A|G|). Now 


Pvs{g) = '^p{g,t)vs{t) = Vs{g)Xs PU = UA 


t 


where U = [ui| • • • 1 u|g|]. Prom orthonormality 



As a matrix of eigenvectors, U is invertible with U ^ = ?7^/|G|. Hence 
P = UAU^/\G\, and so: 



k copies 


Or, in terms of coordinates, 



2.3.1 Proposition 

Suppose v is symmetric. Then in the notation above 



( 2 . 6 ) 
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Proof. By definition 


Ik"-' - <r||i = (..**(») - iT{s)f 

sGG 

sGG 

^11^27^1 sGG 

But U'^U/\G\ = I; equivalently 


and so 


'^vt^{s)vt2{s)/\G\ = 5t,{t2) 
seG 



1 

M 


t^i 


2.3.2 Corollary: Upper Bound Lemma 

Using the same notation, where || • || is the variation distanee: 

(2.7) 

Proof. The proof is a rudimentary application of the Cauchy-Schwarz In¬ 
equality: 



27 



2.3.3 Corollary 

In the same notation: 

_vr||2 < l^^(A*)2fc (2.8) 

Proof. Since |At| < A* for all f 7 ^ 1, 

Note that the eigenvectors of symmetric matrices can be chosen to be real¬ 
valued [T], so that vtvt = v^. Also UU'^ = \G\I and hence U'^U = \G\I 
thus 

'^vtief = |G| 

t£G 

= l + ^ut(e)2 = |G| • 


When n is symmetric, the associated stochastic operator, P, is symmetric 
and hence has real eigenvalues which can be ordered l = Ai>Ai>---> 
'^IGI > “1- So now A* = IA 2 I or lAicil. Of course, if the spectrum of P 
can be calculated then these bounds are immediately applicable, however 
more often one must do with estimates. Diaconis and Saloff-Coste m has 
many examples. Lemma 1 in that paper is a standard result in the field and 
is proved by consideration of the probability v' = {v — z/(e)5®)/(l — r'(e)) 
however a quick application of Gershgorin’s circle theorem |24| shows the 
Aigi > — 1 -|- 2i^(e) result also. As the Gershgorin result is mentioned in 
the sequel, and not typically used by the random walk community, it is 
presented here: 

2.3.4 Gershgorin’s Circle Theorem 

Let A he a complex n x n matrix with entries aij. Let Ri = \aij\ 

the sum of the absolute values of the entries in the ith row, excluding the 
diagonal element. If B[aii, Ri] is the closed disc centered at an with radius 
Ri, then each of the eigenvalues of A is contained in at least one of the 
B\aii, Ri\. 
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Proof. Let A be an eigenvalue of A with eigenvector v. Let |n(/c)| = maxj |r'(j)|. 
Now 


Av{k) = '^akjv{j) = Xv{k). 
i=i 


That is 


'^akjv{j) = Xv{k) - akkv{k). 


Divide both sides by v{k)] 


, akjvU) 

X-akk= ... - 

v[k) 


Now as |f(j)l < 


Ej^k akjvU) 

v{k) 


< I flfcj I 


vU) 

v{k) 


^ ^ ^ I ® fcj I — Rk ■ 

j¥^k 


In other words |A — akk\ < -Rfc • 


Note that the diagonal entries of a stochastic operator driven by i/ G 
Alp{G) are all i^(e). Hence the radii, Rt, are all equal to l—v{e). The theorem 
says, for any eigenvalue of the stochastic operator. A, |A — z^(e)| < 1 — v{e). 
Note Tr P = \G\i^{e). If P is put in Jordan form, since the trace is basis 
independent, it is found that Tr P = Xt- Hence the average of the 
eigenvalue^ is equal to z^(e). Therefore, as I is an eigenvalue, there are 
eigenvalues less than i^(e), i.e. A < r'{e). In the symmetric case, therefore, 
A|c| — z/(e) < 0 so that —X\Q\ + v{e) < l — v{e), thus A|c| > — I + 2z/(e). In the 
general, not-necessarily-symmetric case, the eigenvalues are not necessarily 
real. However, with |A —i^(e)| < 1 —i^(e), if z^(e) > 1/2, then the eigenvalues 
are bounded away from zero so that the stochastic operator, P, is invertible. 


^it would be interesting to try to apply this to obtain a bound for — 7r|| 
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2.4 Comparison Techniques 

Whilst some random walks yield easily to analysis, others do not. There are 
a number of techniques, due to Diaconis & Saloff-Coste m, however, that 
allow comparison with a simpler walk. Often the continuous analogue of a 
discrete random walk yields readily to analysis. Diaconis &: Saloff-Coste m 
present, in the symmetric case, the most general relationship between the 
discrete and continuous time version of a given random walk. This paper 
also uses Dirichlet forms and the Courant minimax principle to estimate 
eigenvalues on a complicated walk from a simpler version. 

2.5 Lower Bounds 

The definition of variation distance immediately gives a technique for gen¬ 
erating lower bounds. Given a test set B C G, immediately ~ "^11 ^ 
— Tr{B)\. A very simple application uses the fact that |supp(z^*^)| < 
|S|^. Let Ak <Z G he the set where vanishes. Clearly 

W*{A,) - T{At)\ = ^(At) > TdGI - |E|‘) = 1-1^- 

Another elementary method for generating a lower bound using a test func¬ 
tion is apparent via 




1 

— max 
2 llc^lKi 


teG 


(2.9) 


The discussion in Section 16.21 implies that if the (right) eigenvector Vg is 
normalised to have ||us||cxd = 1 , then Vg will have expectation zero under the 
random distribution, Xt'^(^)^s(i) = 0. 


2.5.1 Proposition 

Let u\ be a real left eigenvector with eigenvalue A / 1 and normalised such 
that TT + u\ G Mp{G). Then 

ll>'*‘->r||>i|K||i|A|'= 
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Proof. Let 8 = Tr-\-u\. Using the fact that a non-Dirac initial distribution 9 
converges faster than any Dirac measure (see (l2.4p L it is clear that ||0P^—7r|| 
is a lower bound for ~ 7r||; 

\\ep’^ - 7r|| = llvr -h X’^ux - 7r|| = ^||ua||i|A|^ • 


2.6 Volume Sz, Diameter Bounds 


By elegant analysis of the properties of the geometry of the random walk, 
bounds may be put on the eigenvalues of P and applications of the bounds 
of this Chapter give bounds on the variation distance. The geometry of 
the random walk is determined by its Cayley graph. Suppose that ^ is a 
random walk on G with driving probability supported on a generating set 
S. The Cayley graph of the random walk is a directed graph with vertex set 
identified with G. For any g £ G, a & T,, the vertices corresponding to the 
elements g and ag are joined by a directed edge. Thus the edge set consists 
of pairs of the form {g,ag). The growth function of the random walk is 
V{k) := |S^| and the diameter of A, is the minimum k such V{k) = |G|. 
Say a random walk has {A, d) moderate growth if 


yjk) ^ 1 

U(A) - kl Va; 


1 < A: < A. 


( 2 . 10 ) 


The following theorem appears in Diaconis & Saloff-Coste m- The proof 
— via the heavy machinery of path analysis, flows, two particular quadratic 
forms and some functional analysis — is omitted. More details are to be 
found in m- First an attractive lemma: 


2.6.1 Lemma 

Let be a symmetric random walk with diameter A. Let L := min{z/(s) : 
s E S}. Then, where A 2 is the second largest eigenvalu^ 

A2 < 1 - 4 (2-11) 


^i.e. not necessarily A* 
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2.6.2 Theorem 


Let ^ be a symmetric random walk with {A, d) moderate growth. Then for 
k = {1 + c)A‘^ jL, with c > 0; 

- tt\\ < Be-^' ( 2 . 12 ) 

where B = 

Conversely, for k = cA^/(2^'^+^T^).- 

-vr|| > (2.13) 


Example: The Heisenberg Group 

Consider the set of matrices: 

/ 1 a 6 \ 

Hsin) = 0 1 c 

V 0 0 1 / 


(2.14) 


where a, b, c £ Z„. With matrix multiplication modulo n, H'i{n) forms a 
group of order n^. The random walk driven by the measure £ AIp{H‘^(n)) 
constant on the matrices {a,b,c) = (±1,0,0), (0,0,±1), (0,0,0) is ergodic. 
Diaconis &: Saloff-Coste |16j have shown that the random walk has diameter 
n— 1 < A < n± 1 and volume growth function V(k) > k^/6 {1 < k < n + 1). 
With ordeiH |H 3 (n)| < 8A^, 


V{k) ^ k^/6 
V{A) - M3 


1 

48 



for 1 < /c < A, 


the random walk has (48, 3) moderate growth. 

Precise application of Theorem 12.6.21 yields for constants A, A', B, B': 

_ ^11 < (2.15) 

Hence order r? steps are necessary for convergence to random. 


‘^rd’ < 8(n - 1)® < 8A® for n > 4. 
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Chapter 3 

Diaconis-Fourier Theory 


Much of the precluding analysis passes neatly into the case of the classical 
Markov theory for a random walk on a finite set X. It has been seen that 
this analysis culminates in the result that the rate of convergence of to a 
stationary state is related heavily to the second largest eigenvalue of the 
stochastic operator. As a rule the calculation of the second highest eigen¬ 
value is too cumbersome for larger groups and further the bound is not 
particnlarly sharp due to the information loss in disregarding the rest of the 
spectrum of the stochastic operator. 

In his seminal monograph [12], Diaconis utilises the group structure to 
produce bounds for rates of convergence. He uses Fourier methods and 
representation theory to produce bounds that are invariably sharper as the 
entire spectrum is utilised. This chapter follows his approach. 

3.1 Basics of Representations and Characters 

A representation /? of a finite group G is a group homomorphism from G 
into GLiy) for some vector space V. The dimension of the vector spac^il 
is called the dimension of p and is denoted by dp. If VF is a subspace of V 
invariant under p{G), then p\\y is called a subrepresentation. 


^at this point the underlying vector space may be infinite dimensional but later it 
will be seen that the only representations of any interest are of finite dimension. Also 
the underlying field is unspecified at this point but later it will be seen that the only 
representations of any interest will be over complex vector spaces. 
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If is an inner product on V, {u,v)p = p{t)v) defines an¬ 

other, and further the orthogonal complement of W with respect to (., .)p, 
W'^, is also invariant under p. Hence, every representation splits into a direct 
sum of subrepresentations. Both {0} and V itself yield trivial subrepresen¬ 
tations. A representation p that admits no non-trivial subrepresentations is 
called irreducible. An example of an irreducible representation is the trivial 
representation, r, which maps G to 1: p{s)z = z, z £ C. Inductively, there¬ 
fore, every representation is a direct sum of irreducible representations. A 
quick calculation shows {p{s)u, p{s)v) p = {u,v)p, hence ||u||p = ||/9(s)u||p so 
the operators p{s) are isometries and are thus unitary. Two representations, 
p acting on V and g acting on W; are equivalent as representations, p ^ g, 
if there is a bijective linear map / E L{V, W) such that go f = f o p. In this 
context / is said to intertwine g and p. 


Example: A T^vo Dimensional Representation of the Dihedral 
Group 

The dihedral group D 4 , the group of symmetries of the square, admits a 
natural representation p. The elements of D 4 are the rotations ro, r^/ 2 , 
r 3^/2 and reflections (12), (13), (14), (23). If the vertices of the square are 
inscribed in a unit circle at the poleqj then p{rg) are the rotation matrices: 

. . f cos 6 — sin 6 \ 

~ \ sine cosO ) 


Similarly the reflections have action as reflection in y = x, y = —x, y = 0 
and X = 0 which have matrix representations: 


.((12))=(j ) .((13))=(-; “) 

.((14)) = ( ; ; ) .((23)) = ( V ) 


^i.e. the coordinates (±1,0), (0,±1). 
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3.1.1 Schur’s Lemma 

Let Pi : G ^ GL{Vi) and p 2 ■ G ^ GL{V 2 ) be two irredueible representa¬ 
tions of G, and let f G L{Vi,V 2 ) be an intertwiner. Then 

1. If Pi and p 2 are not equivalent / = 0. 

2. If Vi =: V := V 2 is complex, and pi := p =: p 2 , f = XI, for some, 
A G C. 

Proof. The straightforward calculations f{pi{G) ker /) = p 2 (G')/(ker /) = 0 
and 

P 2 {G)lm f = f{pi{G)Vi) show that ker / and Im / are invariant subspaces. 
By irreducibility both the kernel and image of / are trivial or the whole 
space. 

1. Suppose / ^ 0. Hence ker / = {0} and Im / = V 2 so / is an isomor¬ 
phism as it is linear. However this would imply that pi and p 2 are 
equivalent as representations, a contradiction. Thence / = 0. 

2. If / = 0 then / = 0./. Suppose again / ^ 0. Then / has a non-zero 
eigenvalue A G C with associated non-zero eigenvector v\ / 0. Let 
f\ = f — XI. A quick calculation shows that p{G)fx{V) = fx{p{G)V), 
hence fx is an intertwiner. Note that ker/;^ / {0} as vx G ker/ a. 
Thence ker /a = V, that is fx = 0, which implies f = XI • 


Let pi : G —)■ GLiVi) and p 2 ■ G ^ GL{V 2 ) be two irreducible represen¬ 
tations of G and ho G L{Vi, V 2 ). Let 

(3.1) 

t^G 

A quick verification shows that h is an intertwiner of pi and p 2 , and by 
recourse to Schur’s Lemma /i = 0 in the case where pi ^ p 2 , and h = XI 
when Pi = p 2 . In the case pi = p 2 , taking traces gives A = Tr h/dp and 
a further calculation shows Tr h = Tr /iq. Suppose pi and p 2 are given in 
matrix form as pi(s) = (rb(s)) and P 2 {s) = (r|^(s))- The linear maps h and 
ho are defined by matrices Xij and x^j. In particular, 

( 3 - 2 ) 

' ' teG 

A ,/j, 
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Suppose Pi ^ p 2 so that h = 0 when defined by ho = 5ki ■ In this case Xij = 0 
and ()3.2p collapses to 

= 0 , Vi,A;,Z,j. (3.3) 

1^1 7^ 

In the case where pi = P 2 , h = A/, where, in matrix elements, A = 
Ylm^mm/^p- When h is defined by ho = Sku (13.2p collapses to 

7^ (3.4) 

1^1 ttc 

Note again that p{s) is a unitary operator so that p{s)* = p“^(s), thence 
A simple rearrangement of (|3.3I) and (13.41) using this 
fact show that the matrix elements of the irreducible representations are 
orthogonal in F{G). 

If p is a representation, the character of p, Xp(s) := Tr p(s). Using the 
preceding remarks, it can be shown that the characters of the irreducible 
representations are orthonormal in F{G). If pi and p 2 are representations 
with characters xi and X 2 -, by choosing a basis so that the matrix of pi 0 p 2 is 
a block 2x2 matrix with pi in the (1,1) position and p 2 in the (2, 2) position, 
taking traces shows that the character of pi © p 2 is Xi + X 2 - Suppose now 
p is a representation with character (j) that decomposes into a direct sum of 
irreducible representations p = pi © • • • © Pfc. If each of the pi have character 
Xi, then </> = Xi+' ■ ■+Xfc- H p' is an irreducible representation with character 
X, then (01 x) = Ei(Xi|x)- By orthonormality, (xi|x) = 0 or 1 as Xi is, or is 
not, equivalent to x- Thence, the number of p* equivalent to p' equals (0|x)- 

A canonical representation is the regular representation] defined with 
respect to a complex vector space with basis {es} indexed by s G G via 
r{s){et) := Cgt- Observe that the underlying vector space is isomorphic to 
F{G). It is a simple exercise to show that Xr-(e) = |G| and zero elsewhere. 
This implies that for an irreducible representation pj, [xAxi) = X*(e)* = 
Tr Irf. = di so that Xr(s) = diXi{s), where the sum is over all irreducible 
representations. Letting s = e here yields — 1^1- Now it can be seen 

that the matrix entries of the irreducible representations form an orthogonal 
basis for F{G) because they are orthogonal and there are = 1^1 

them: dimF(G) = |G|. 
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3.2 Fourier Theory 


Let / G F{G) and p a representation of G. The Fourier Transform of f at 
the representation p is the operator f{p) = Yls This Fourier trans¬ 

form satisfies an inversion theorem, a Plancherel Formula; and, of course, a 
Convolution Theorem f -kh{p) = f{p)h{p) whose proof is rudimentary. 

3.2.1 Fourier Inversion Theorem 

Let f G F{G), then, where the sum is over irreducible representations, 

f{s) = X] (3-5) 

Proof. Both sides are linear in / so it is sufficient to check the formula for 
f = St- Then f{pi) = Pi{t), and the right hand side equals 

l^^diTr {pi{s-^)pi{t)) = 

When s = t this equals 1; otherwise it is 0; i.e. it equals 5t • 


3.2.2 Plancherel Formula 

Let f, h £ F{G), then 

^ f{s-^)h{s) = TL'^diFt ifipiMpi)) (3.6) 

seG ' ' i 

Proof Both sides are linear in /; so consider f = St- Using the Fourier 
Inversion Theorem 

Ht~^) = '^dt{s-^)h{s) = j^'^diTt {pi{t)h{pi)) 
s^G i 

However, pi{t) is nothing but St{pi) so the formula is verified • 
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In the sequel, mostly elements of Mp{G) viewed as elements of F{G) are 
considered. Let /r G Mp{G) and let /i(s) := After a reindex, t 

jl{p) = With the unitary nature of the representation, and 

the fact that /r = /I as /r G M, in fact fi{p) = p{p)*- Hence, for u G Mp{G): 

[i^ipi)HpiT] ( 3 - 7 ) 

teG ' ' 

With the aid of two quick facts the celebrated Upper Bound Lemma of Di- 
aconis and Shahshahani [13 [H] may be proven. The first of these is the 
straightforward calculation that for all v G Mp(G), at the trivial represen¬ 
tation r, T(r) = = 1- The second comprises a lemma. 

3.2.3 Lemma 

At a non-trivial irreducible representation, p, the Fourier transform of the 
random distribution, tt, vanishes: n{p) = 0. 

Proof. First note that h = P(^) ^ linear map, invariant under any 

p{s): p{s)h = h = hp{s). As a consequence both ker h and Im h are invariant 
subspaces. By irreducibility, both the kernel and the image of h are trivial 
or the whole space. Suppose ker h = {0} and Im h = V . For any u G U, 
p{s)hv = hv. Hit both sides with h~^: h~^p{s)hv = v. Now use the fact 
that p{s) and h commute to show p{s)v = v. Hence p is trivial. Therefore 
ker/i = V, Im h = {0}, i.e. /i = 0. Now 7 f(/?) = ~ ^/\G\ = 

0 • 

3.2.4 Upper Bound Lemma 

Let n be a probability on a finite group G. Then 

Wi' - A? {^(PiWiPi)*)^ ( 3 - 8 ) 

i 

where the sum is over all non-trivial irreducible representations. 

Proof. Using the Cauchy-Schwarz Inequality 



< IG*! Y = 1*^1 Y^’^ ~ “ ^)(^)’ 

teG teG 
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where of course — vr is a real function. Thus, by (j3.7p 


4||i/-7rf {u - T:){pi){u - TT){piy 


Now {v — tt){p) = ~ ^(^))p(^) = ^(p) “ ^(p)- With the preceding 

facts: 


{i^-7t){p) 


0 if /? is trivial 

'Pip) if p is non-trivial and irreducible 


So therefore 


4||z/ - 7r|p < ^ hi Tr iP{pi)Pipi)*), 

i 


where the sum is over all non-trivial representations • 


This bounds are applicable to \\v*^ 

v*^ip) = P{p)^. 


till via the Convolution Theorem: 


3.3 Number of Irreducible Representations 

Let G be a group and g, h elements of G. An element s' G G is conjugate 
to h, g ^ h, if there exists t & G such that h = tgt~^. Conjugacy is an 
equivalence relation on a group [22], and hence forms a partition of G into 
disjoint conjugacy classes G = U [s2]~ U • • • U where 

= {p£G':3tGG,s' = tst~^} = {tst~^ : t G G}. (3-9) 

A complex function / G T(G) is a class function if for all conjugacy classes 
C G, /|[s.] = A, for some A G C. Let C[(G) be the subspace of T(G) 

consisting of all class functions. The characters of a representation are class 
functions. Let / G Ci[(G) and p be an irreducible representation. Note that 
p('S)/(p)p('S~^) = with a reindexing t s~^ts, it is 

clear that / is an intertwiner for p. Thus, by Schur’s Lemma, f{p) = XI. 
Taking traces gives A = Tr (f{p))/dp = T,t fit)xit)/dp = |G|(/|x*)/dp- 
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3.3.1 Theorem 


The characters of the irreducible representations Xi^X2, ■ ■ ■ ,Xi form an or¬ 
thonormal basis for (11(G). 

Proof. Characters are orthonormal class functions. As d(G) together with 
(.|.) forms an inner product space, and Q = spanjxi} is a subspace: Gd = 
n 0 n-*-. Let / G Gd have the decomposition f = g h, with g ^ h ^ n-*-. 
Therefore for all irreducible representations Xi- {h\Xi) = 0. The preceding 
remarks indicate that h(p) = \G\{h\x^)I/dp = 0. The Fourier Inversion 
Theorem yields: 

h{s) = = 0 - 

Hence therefore H-*- = {0} and the characters of the irreducible representa¬ 
tions span G1[(G). • 

3.3.2 Theorem 

The number of irreducible representations equals the number of conjugacy 
classes. 

Proof. Theorem 18. 3. II gives the number of irreducible representations, 1: 

I = dim(G:[(G)) 

A class function can be defined to have an arbitrary value on each conjugacy 
class, so dim(Gl[(G)) is the number of conjugacy classes • 


As an immediate corollary, all the irreducible representations of an Abelian 
group G have degree 1. To see this note if G is Abelian, there are |G| con¬ 
jugacy classes, so |G| terms in the sum = |G|, each of which must 

be 1. Hence if G has I conjugacy classes and I representations are found, 
if the I representations are inequivalent and irreducible, all the irreducible 
representations have been found. 
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3.3.3 Theorem 

Two irreducible representations with the same character are equivalent. 

Proof. Suppose xij X 2 are identical characters of non-equivalent irreducible 
representations pi and p 2 , 



However the characters of irreducible representations are orthonormal. This 
is a contradiction; hence pi = p 2 • 

3.3.4 Theorem 

Let X be the character of a representation p, then p is an irreducible repre¬ 
sentation if and only if {x\x) = 1- 

Proof. Clearly if p is irreducible (x|x) = 1- Suppose for the converse that 
(xlx) = 1- Any representation p is the direct sum of irreducible representa¬ 
tions {pi} with character X = Xi + X 2 + • • • + Xm- Therefore if (xlx) must 
equal 1, then there exists a unique pk such that p = pk • 

Example: The Quaternion Group, Q 

Consider the quaternion group Q = {±1, ±j, zLk} where 1 is the identity. 

Multiplication in Q is defined by (—1)^ = 1 and i^ = p = k'^ = ijk = —1, 
where —1 commutes with everything. The quaternion group has five conju- 
gacy classes {!}, { —1}, {±*}, {±j} and {±k} and thus five irreducible repre¬ 
sentations. As — IG'I) the there must be one irreducible representation 

of degree 2 and four of degree 1. Consider the linear map p : Q —)• GL(C^) 


given by: 



(3.10) 


Straightforward calculations show that /? is a representation. Also (x|x) = 
1, and in light of Theorem 13.3.41 p is the two dimensional irreducible repre¬ 
sentation. Let T : Q ^ GL{C) be the trivial representation; it is the second 
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irreducible representation. Let Pi : Q ^ GL{C) (respectively pj, pk) be 
defined by: 


Pi{s) 


1 if s G (i) 

— 1 if s 0 {i) 


(3.11) 


This is a one-dimensional representation so is irreducible. It is an easy 
calculation to show that {r, Xi, Xj; Xfc} is an orthogonal set so comprise four 
inequivalent representations. Hence the set of irreducible representations of 
Q are given by {p,T, pi, pj, pk}. 


3.4 Simple Walk on the Circle 


Consider the walk on {Z„,0} driven by 


Vn{s) ■■= 


if s = ±1 
otherwise 


(3.12) 


is an Abelian group, so all irreducible representations have degree 1. 
Any p is determined by the image of 1; p{s) = /9(1^) = p(l)*. Also 1” = 0, 
hence /9(1)"' = /o(l"') = /o(0) = 1 so p{l) must be a n-th root of unity. 
There are n such; t = 0,1, 2,... , n — 1. Each gives a representation 

Pt{s) = Now some results used in the Lower Bound; see Appendix 

A for proof. 


3.4.1 Lemma 


The following (in) equalities hold. 
1. For any odd n and /c G N, 


n—1 

(n-l)/2 


E 

cos^^(27rt/n) = 2 ^ cos^^ (nt/n) 

(3.13) 

t=i 

t=i 


2. For X G [0,7r/2], 

cos X < 

(3.14) 

3. For any x > 0 

oo oo 




(3.15) 


j=l j=0 


4. For X G [0, tt/G], 

cosx > e-"'/2--V2 

(3.16) 
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3.4.2 Upper and Lower Bounds 

For k > n^/40, with n odd, 

-vr„|| < 

Conversely, for n>7, and any k 

II ★/c II \ ^ —n^k/2n^—7T'^k/2n'^ 

ll^n 7r7T,|| ^ e 


(3.17) 


(3.18) 


Proof. The Fourier transform of at ps is: 

n—1 

2 2 


(p^) = ^ ^ = cos 


t=o 


The Upper Bound Lemma and (13.131) yield 


hr =2 E 


l^-rt "TT' 


n '‘rill 


t=l 


\ n J 2 


cos 


2k 


t=l 


nt 


n 


Applying (13.14^ yields 


(n-l)/2 




t=l 


t=l 


and so with (I3.15P 

\\,,*k _ ||2 < £ -TT^kln Sr^ -S-K^tkln^ _ £ ^ _ 

Ihn ^n\\ “ 2 1 - ■ 

Now since k > n^/40, 2 ^1 — and it follows that 


For the lower bound, consider the norm 1 function (j){s) = Psis) = 
cos(27rss/n) where s = (n—1)/2. By Lemma [3.2.3l (f){s) has zero expectation 
under the random distribntion. Now an application of (12.911 gives 


- »nll > t; 


E^ 

teG 


,-kk 




1 

^_^ 


jy*k 

“ 2 

n 


= 2 \^iPs)\ 
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Now i^{ps) = cos(27rs/n) = — cos(7r/n) by a quick calculation. By (|3.16p . 
for vr/n < vr/b: 


Wn'" -'^nW > ^ 


TT 

COS — 

n 


^ _g— 7r^A:/2n^—7r^fc/2n^ 

2 


Remark 

If n is even then {1,-1} lies in the coset of odd numbers of the normal 
subgroup {0, 2,... , re — 2} =: H <\ Z„, and so the walk is not ergodic by 
Theorem 11.3.21 



Figure 3.1: A plot of the upper and lower bound for re = 11. 
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3.5 Nearest Neighbour Walk on the n-Cube 

Consider the walk on n > 1, driven by 


t'n(s) 


^ if w{s) = 0 or 1 
0 otherwise 


(3.19) 


where w{s), the weight of s = (si, S 2 , • • •, Sn), is given by the sum in N: 

n 

w{s) = Si (3.20) 

i=l 

Z 2 is an Abelian group, so all irreducible representations have degree 1. 
It is a simple verihcation to show that each are given by pt{s) = (—1)*'^. 
Now some results used in the Upper Bound; see Appendix A for proof. 


3.5.1 Lemma 

The following inequalities hold. 
1. Ifl<nl2, 



(1- — 

V n + 1 


2k 


> 


n 

n + 1 


1 - 


2(n + l - 0 
n + 1 


2k 


(3.21) 


2. When a <b, 



(3.22) 


3. Let n G N, c > 0. If k = {n + l)(log n + c)/4 

2k 


1 - 


2j 


n + 1 


< g-jlogn-jc 


(3.23) 


3.5.2 

For k 


Upper Bound 

(n + l)(log n + c)/4, c > 0.- 



(3.24) 
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Proof. Let {cj} denote the standard basi^ of Z 2 : 




tez^ 


n + 1 


i + 53(-ir‘ 


i=l 


Now S ■€{ = Si so 


lYiips) = 


1 


n + 1 

1 

n + 1 

1 


i+E(-i)' 

i=l 

i + E(-i) + E(') 


«i=l Si=0 

[1 + t(;(s)(-l) + (n - t(;(s))(l)] 
2rt;(s) 


n + 1 
n + 1 — 2w{.s) 


n + 1 


= 1 - 


n + 1 


Thus Upper Bound Lemma gives (summing over weights on the right equal¬ 
ity): 


,-kk 


-TTnf < = 






n + lj 


2k 


(3.25) 


Let n/2 < j < n such that j = n + 1 — I (i.e. I G {1, 2,, [n/2j}) and 
consider the (n -|- 1 — l)th (i.e. jth) term in this sum. By ()3.2ip . the Ith 
term dominates this term, and for / G {1, 2,... , [n/2j}, 


n -|- 1 
(3.26) 

Noting that the ‘middle’ term (i.e. n odd) is unaffected, (I3.25|] is thus 
dominated by a sum of [n/2] terms. Therefore, with ()3.22l) 


1 - 


21 


n -|- 1 


+ 


n 

n + 1 — I 


1 - 


2(n-M - 1) 


n 


1 


< 2 



®of the finite vector space ZJ with underlying field Z 2 . 
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Applying (j3.23l) and noting 


ln/2] 


,-kk 


'JTri 


< 




J log n 


-j \ogn-jc 


i=i 


, fn/21 

^E 


i=i 


TT 


< 


1 ^ {e-y 

2 ^ j\ 

i=i 


1 

2 



(£^ 

j! 
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Chapter 4 

The Cut-Off Phenomena 


4.1 Introduction 

Given an ergodic random walk a number of techniques for bounding — 

7r|| have been developed. Recall the mixing time, r, as the minimum k such 
that — 7r|| < l/2e. In particular, as \\v*^ — 7r|| is decreasing in k, if 
\\v*^ — 7r|| < l/2e, then t < k. In many random walks, behaviour called 
the cut-off phenomenon occurs and it makes sense to talk about the mixing 
time, T, as the time when ^ is random. 


IIV* ^-TTII 



Figure 4.1: In the cut-off phenomenon, variation distance remains close to 
1 initially until the mixing time r when it rapidly converges to 0. 
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In the cut-off phenomenon, the random walk remains far from random 
until a certain time when there is a phase transition and the random walk 
rapidly becomes close to random. 

4.1.1 Example: Random Transpositions 

As described in Section fl.3.11 repeated random transpositions of n cards can 
be modelled as repeatedly convolving the measure: 

{ 1/n if s = e 

2/n^ for s a transposition 
0 otherwise 

Careful analysis of the representation theory of the symmetric group and an 
application of the Upper Bound Lemma yields [12], for k = (nlogn)/2-|-cn, 
for c > 0: 

- T^nW < (4.1) 

for some constant a. For a lower bound, Diaconis considers the set A G Snof 
permutations with one or more fixed points. Two classical results of Fellei0 
give sharp approximations of v1^{A) and iTniA) and hence a lower bound 
for the variation distance may be given. For k = (n log n )/2 — cn, c > 0, as 
n —>■ oo: 

\yJ-nn\\>(^-^-e-^-"^+o{l) (4.2) 

Hence for n large, the random walk experiences a phase transition from 
order to random at tn = nlogn/2. Indeed, this was the first problem where 
a cut-off was detected ([E])- 


^namely the matching problem and the computation of the probability that when 2k 
balls are dropped into n boxes, that one or more of the boxes will be empty m 
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4.2 Formulation 


There are a number of roughly equivalent formulations of the cut-off phe¬ 
nomenon. The subject developed from the question how many times must 
a deck be shuffled until it is close to random? Card shuffling is modelled 
by a random walk on Sn where the shuffle is defined by the driving proba¬ 
bility V G Mp{G). In most cases, the driving probability v is related to n 
so it makes sense to talk about a natural family of random walks {Sn,nn). 
When a good asymptote of the mixing times of these walks was accessible, 
it was found that in a number of examples that the cut-off behaviour be¬ 
comes sharper as n —oo. As a corollary of this development, the cut-off 
phenomenon is defined with respect to the limiting behaviour of a natural 
family {Gn,nn). 

In general, a formulation will be referenced to a particular distance of 
closeness to random. Surprisingly, given different norms on Mp{G), a ran¬ 
dom walk exhibiting the cut-off phenomenon in the first need not exhibit the 
cut-off phenomenon in the second. There are a number of roughly equivalent 
formulations (see Chen’s thesis i) that introduce a window size Wn- This 
means that the variation distance goes from 1 to 0 in Wn steps rather than 1 
however these formulations still require that S> Wn such that —>■ 0 

hence there is still abrupt convergence. The original formulation of Aldous 
&: Diaconis [1] appeals to an arbitrary sharpness of convergence of variation 
distance to a step function: 

4.2.1 Definition 

A family of random walks {Gn, t'n) exhibits the cut-off phenomenon if there 
exists a sequence of real numbers {tnf^^i such that given 0 < e < 1, in the 
limit as n —oo, the following hold: 

(a) 

(b) -TTnll ^ 1 

(c) tn^ oo 

If Tn is the mixing time of {Gn,^^) presenting cut-off, then the above 
formulation implies that Tn ~ tn so it makes sense to say that tn is the time 
taken to reach random. 
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Example: Walk on the n-Cube 

Recall the walk on the n-Cube from the last chapter. Along with the upper 
bound extracted from the Diaconis-Fourier theory, tedious but elementary 
calculations bound the variation distance away from 0 for k = (n+l)(log n — 
c)/4 for n large and c > 0 ([7] — Th. 2.4.3). This is done via the test 
function (/)(s) = n — 2w{s) whose expectation and variance under vr are easy 
to calculate (namely 0 and n). The set Ap C Z 2 is essentially defined as the 
elements whose weight is sufficiently close to n/2 for some /3: 

Ap := {s G : |(/>(s)| < /3^/n} 

Use of the Markov inequality bounds 7r„(A,g) above 1 — 1//?^. More intricate 
calculations yield i'*^{Ap) < A,/(P and thence 

\\K^ - T^nW - (^•3) 

A more precise definition of j5 in terms of c makes this lower bound usefull. 
Hence it follows that the random walk has a cut-off at time = nlogn/4. 


= e'^/^/2 then the lower bound is 1 — 20/e‘^, which clearly tends to 1 as c increases 
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Example: Simple Walk on the Circle 


The simple walk on the circle does not exhibit cut-off. Considering the 
bounds developed in Section [3.4.21 note that at fe = n^/2, Wu*^ — 7r„|| < 
e~'" and due to the decreasing nature of ~ T^'nll this is an upper 
bound for all k > n^/2. Similarly at A: = j2: 




TTn 


> lg-37r2/4-37r"‘/4n2 lg-37r2/4 

2 n^oo 2 


and this lower bound holds for all k < j2. 


IIV* ^-TTII 



k 


Figure 4.2: In the limit as n ^ oo the simple walk on the circle does not 
experience an abrupt transition from far from to close to random. Note that 
d{k) := ~ TJ'nII and the graph is not to scale. 


It is an open problem to determine for which families of random walks 
{Gn-,Vn) does cut-off occur. Unfortunately there does not appear to be a 
nice condition for an isolated random walk ^ to exhibit cut-off. In contrast, 
given G and v G Mp{G), the ergodic theorem 11.3.21 determines whether or 
not {G, v) is ergodic. 
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An initial attempt at reformulation would be to have as fundamental 
a period of ‘far from random’ and a period of sharp transition to ‘close to 
random’. Rather than being arbitrarily far from random and arbitrarily 
close to random (in the limit), this finitary formulation would have to define 
controls a,b > 0 for far and close to random: 

4.2.2 Definition 

A random walk on G driven by i/ G Mp{G) has {a,b,q) finitary cut-off if 

A := {k : — 7r|| > 1 — a}, B := {k : b < Wu*^ — ttH < 1 — a} and 

q=\A\/\B\. 

Therefore if presents cut-off, each member also has {an,bn,qn) 

finitary cut-off, where an, bn —)• 0, |A„| —>■ oo, and qn —>■ oo. However, 

consider the natural family (Zn,,z^) where u is uniform on {0,±1}. This 

family has (1/2,1/4,0(1)) hnitary cut-off but does not present the cut-off 
phenomenon. For a family, therefore, presenting cut-off is strictly stronger 
than presenting finitary cut-off. It is pretty clear that all random walks 
have some level of finitary cut-off. Is there an appropriate level of quality of 
cut-off? 



Figure 4.3: In a natural definition of cut-off, the exponential function g 
should not have cut-off. The other function, /, certainly exhibits some level 
of cut-off. 

A continuous version of (a, b, q) finitary cut-off can be considered. Let 
/ : M+ —)■ [0,1] be a non-increasing continuous function with /(O) = 1 and 
fix) —)• 0. f exhibits (a,b,q) finitary cut-off where A = infix : f(x) = 
1-a}, B = inf{x : /(x) = 6} and q = A/{A - B). 
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In Figure HTSl / has (l/2e, l/2e, 2.52) finitary cut-off while g has (l/2e, l/2e, 0.14) 
finitary cut-off. In a number of examples of established cut-off, e.g. the top- 
to-random shuffle [U], it has been shown that ^ 1 doubly 

exponentially as e ^ 1. Hence consider l/2e, 1) finitary cut-off as an 

appropriate level for cut-off. Indeed / has {1/e^'^, l/2e, 0.52) finitary cut-off 
while g has (1/e^®, l/2e, 0.0026) finitary cut-off. However this too runs into 
problems. Consider the family of functions/^(x) = (1 —tanh((i(x —1/2)))/2. 

This family has (1/e^®, l/2e, 1) finitary cut-off for d > 12.4. 

Diaconis remarks [12] that Aldous & Diaconis have shown that for most 
probability measures on a finite group G, “ ^11 < l/IG*!) so for large 
groups, most random walks are random after two steps. 

Therefore, without an alternative formulation of the cut-off phenomenon, 
it seems likely there will never be a theorem of the form; A random walk on 
G with driving probability v G Mp{G) presents ‘the’ cut-off phenomenon at 
time k if and only if property P is satisfied. 

4.3 What Makes it Cut-Off? 

To demonstrate the intransigence of the problem note that the asymptotics 
of a reversible random walk 7r|| ~ CAj cannot detect cut-off. A critical 

idea for understanding of the cut-off phenomena is that variation distance 
is sensitive. Suppose a deck of cards is shuffled (by u G Mp{S^2)) but the 
shuffle leaves the ace of spades at the bottom of the deck. If A C S^2 are 
the arrangements of the deck with the ace of spades at the bottom, then 
z/(A) = 1 but tt{A) = 1/52 and ||z^ — 7r|| > 1 — 1/52; the deck is very far 
from random in variation distance! Similarly suppose that after shuffling 
by v that the ace of spades is in the bottom half of the deck. By letting 
B C S 52 be all such arrangements it is clear \\i 2 — 7r|| > 1/2. So for any 
shuffle the entire deck must be well shuffled; it won’t do to have even coarse 
information on a single card. 
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To illustrate further, consider the top-to-random shuffle. This is the 
shuffle that takes the top card of the deck and inserts it back into the deck 
randoml}!^. Suppose the initial arrangement has the ace of spades at the 
bottom of the deck. Initially it will take a while for a card from the top to 
be placed underneath the ace of spades but eventually one will be and the 
ace of spades will be second from bottom. After a great number of shuffles 
the ace of spades will eventually surface at the top of the deck. At every 
stage up to this point, to within a statistical deviation, the ace of spades is 
in a specific portion of the deck, dependent on the number of shuffles. Hence 
up to this point the deck will be far from random. After this step however 
the ace of spades shall be placed at a random position in the deck and there 
is every chance the deck is random. It will be seen in the next chapter that 
the time for the bottom card to come to the top is essentially the time to 
random and hence the cut-off time. 


The survey article by Diaconis m suggests a number of reasons why cut¬ 
off may occur. Diaconis claims that high-multiplicity of second eigenvalue 
implies cut-off after a remark of Aldous & Diaconis [1] . The result from m 

lk*^-7r||2 >m*A* (4.4) 

has some implications for this claim in the two norm (see Chen [8]). However, 
in this thesis, cut-offs in variation norm are the subject of study. One might 
fear ‘folklore heuristic’ failure here. Indeed the claim of Diaconis is almost 
cited as fact by Hora [201 El]. Perhaps a more measured statement would be 
that to show cut-off the random walk may have to exhibit a high degree of 
symmetry which can imply high multiplicity of the second largest eigenvalue. 
In the extreme case of almost all eigenvalues equal to A* (remembering the 
average of the eigenvalues is i^(e)), the variation distance looks like CA* and 
this doesn’t look like cut-off. 


®i.e. driven by the measure constant on the cycles (1, m, m — 1,..., 3, 2), m = 1,..., 52 
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Chen [8] discusses a conjecture of Peres that a general Markov chain 
exhibits the cut-off phenomenon if and only if rn(l — A„*) —)• oo. Any 

n—)-oo 

Markov chain with cut-off will satisfy this condition. Chen &: Saloff-Coste 
[9] have proved this conjecture in the p-norm case for 1 < p < oo however 
Aldous has given a Markov chain which is a counterexample in variation dis¬ 
tance [1]. Presently there is no known counterexample to Peres’ conjecture 
in the case of random walks on groups. 

Theorem 12.6.21 is relevant for family of groups (Gn,i^n) of moderate 
growth with |S|, A, d fixed as n —?• oo. These random walks take large 
multiple of to get random. While a small multiple of is not sufficient 
for randomness, the transition from 1 to 0 as the number of steps grows is 
smooth so that the cut-off is not exhibited m- Diaconis m notes that — 
via Gromov’s Theorem for nilpotent groups of finite index — this result is 
generic. For random walks on families of nilpotent groups where |S| and the 
index are bounded as n —>■ oo, order A^ steps are necessary for convergence 
and there is no cut-off. Two examples of such walks are the simple walk on 
the circle and the walk on the Heisenberg groups, and indeed these are the 
canonical examples where cut-off does not occur. 


56 




Chapter 5 

Probabilistic Methods 


5.1 Stopping Times 

In previous chapters the convergence behaviour of a random walks has been 
examined. It is natural to ask questions of the type from which time T 
onwards does f,T have a partieular property. As a simple example of such a 
random time, consider a random walk The lowest Tq such that = e is 
such a random time, namely the first return time. 

To make precise, let Ak be the ci-algebra generated by the random vari¬ 
ables : j < A:}, for j, k G Nq. Then the fi-algebra generated by the 
(T-algebras {.4,^ : k G No}, A, canonically admits an increasing sequence: 

Ao c Ai c ■ ■ ■ c Ak c ■ ■ ■ c A 

of sub-cr-algebras of A (i.e. a filtration). If S{G) is the set of sequences 
in G, then a stopping time is a map T : S{G) —)■ N U { 00 } which satisfies 
{T <k} G Ak for all k gN. 

To formalise the first example of a stopping time, the first return time, 
write To = minjA: >1 ■ fk = &}. Of course this generalises easily to 
another example of a stopping time, namely the first hitting time, Tg = 
minjA: >0 : f,k = g}- More generally, a subset A G G has first hitting time 
Ta = min{A: > 0 : £ A} 
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New stopping times may be constructed from old. If T and S are stop¬ 
ping times for a random walk then so are min{T, 5}, max{T, S'}, and 
T -|- n, n G N (see [28] for proof). The standard analysis of stopping times 
involves an examination of their expectation, E^. There is a strong relation¬ 
ship between the random distribution vr and stopping times which is given 
in the following proposition. 

5.1.1 Proposition 

Let ^ be a random walk on a group G. Let T G N 6e o non-zero stopping 
time such that = e and E^T < oo. Let g G G. Then 

Efj^{number of visits to g before time T) = E^T/\G\ 


Proof. Taking the approach of |5] (Proposition 4, Chapter 2), 
write p{g) = (number of visits to g before time T). Now 




pia) _ p{g) 


E^T 


EtPW 

is a probability measure on G. Next it is claimed that 

'^X{t)p{t,g) = X{g). 


teG 


(5.1) 


To see this note that 

^ OO 

Ha) = yt ^ =a,T>k). 

li g = e, then /i(^o = e) = = e) = 1. Also, for g ^ e, hy hypothesis, 

P-iCo = a) = p{f,T = S') = 0. Therefore, in the reindexing ^ ^k+i, the 
term /i(^o = a) is replaced by = S') (in the event T = A; -|- 1). Thus 

^ OO 

X{g) = ^ p{ik+i =g,T> k) 

k=0 
1 °° 

= Yt ^ ^ = t,T> k, 4+1 = g) 

fc=oteG 
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By the Markov property, 


X{g) = -rrj; Y1 Y1 = t,T> k)p{t, g) 

k=0tGG 

= Y1 9 ) = Y1 9 ) 

t&G t&G 

Thus it is shown that \P = A, and so A is in fact the nnique stationary 
distribntion. Consequently 

Ka) = = T^ia) => p{a) = T^{a)E^T • 


5.2 Strong Uniform Times 

Consider the following shuffling scheme. Given a deck of n cards in order 
remove a random card and place it on the top of the deck. Repeat this shnffle 
until the random time T when every card in the deck has been touched. This 
T is a stopping time and further every arrangement of the deck is equally 
likely at this time. Call such a stopping time a strong uniform time: a 
stopping time T such that = g) = l/ICI- Diaconis [12] remarks that 
this is equivalent to = g\T < k) = 1/|G|. 

Aldous & Diaconis [3| gives a classic account of strong uniform times. For 
many applications, including the random to top shuffle, the classical coupon 
collector’s problem is required knowledge. Consider a random sample with 
replacement from a collection of n coupons. Let T be the number of samples 
required until each coupon has been chosen at least once. 
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5.2.1 Coupon Collector’s Bound 

In the notation above, let k = nlogn + cn, with c > 0. Then 

H{T >k)< e"^ (5.2) 

Proof. The proof is standard but this is taken from m- For each coupon 
b, let Ab be the event coupon b is not drawn in the first k draws. The 
probability of not picking b once is 1 — 1/n, hence ta{Ab) = (1 — 1/n)^. 
Thence 

Pl{T > k) = pL ^ p{Ab) = n ^1 - = e”'’ • 

Recall the separation distance s{k). The separation distance is related 
to strong uniform times via the following theorem: 

5.2.2 Theorem 

If T is a strong uniform time for a random walk driven by u £ Mp{G), then 
for all k 

\W*^ - vr|| < s{k) < n{T > k) (5.3) 

Conversely there exists a strong uniform time such that the rightmost in¬ 
equality holds with equality. 

Proof. Variation distance is controlled by separation distance so it suffices 
to prove the rightmost inequality. Again taking the approach of [12], let ko 
be the smallest k such that pi{T < ko) > 0. The inequality (|5.3p holds if 
k = oo and for k < k^. For k > ko, s G G: 

sik) < 1 - \G\,^*Hs) < 1 - |G|/i(4 = s,T<k) 
s{k) < 1 — \G\iJ,{f,k = s|T < k) •/x(r < k) 

'-V-' 

=1 

< 1 — /i(r < k) = /i(r > k) 

See |T2| (Theorem 4, Chapter 4C) for the converse result • 

This result along with the coupon collector’s bound applies immediately 
to the random to top shuffle. The upper bound proved here is supplemented 
by the (tricky) second result from [T2| to yield another example of a random 
walk exhibiting cut-off: 
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5.2.3 Theorem 

For the random to top shuffle, let k = nlogn + cn. Then 

— 7r|| < e~'^ for c > 0, (5-4) 

\\n*^ — 7r|| —>■ 1 as n ^ oo, for negative c = c{n) —oo (5-5) 


5.3 Coupling 

Coupling is a theoretically stronger method than that of strong uniform 
times. A coupling takes a random walk ^ along with the random walk II 
(with random distribution) and couples them as a product process (Cjkl). 
The interpretation being that the two random walks evolve until they are 
equal, at which time they couple, and thereafter remain equal. More for¬ 
mally a coupling of a random walk ^ (with stochastic operator P) takes a 
‘random’ operator T on Mp{G) x Mp{G) and uses it as an input into (.^,11) 
such that the marginal distribution of the first factor is precisely the distribu¬ 
tion of The operator must be random in the sense that r(^, tt) = {piP, vr). 
Hence r(i/*^,7r) = (i/*^+^,7r). The operator must act on Mp{G) x Mp{G) in 
such a way that the begin to match up with the H^ until all the elements 
lie along the diagonal: = n^. That is after T steps the process will have 

the same distribution as the second process: that is after the stopping time 
k = T steps the walk will be random. Call such a T a coupling time. For 
appropriate couplings, the coupling time, T, may be calculated. To make 
this argument precise a lemma from [12] about marginal distributions is 
required. 

5.3.1 Lemma 

Let G be a finite group. Let pi, p 2 G Mp{G). Let p G Mp{G x G) with 
margins pi, p 2 . Let A = {(s,s) : s G G} be the diagonal. Then 

\\pi- P2\\ < Ai(A*^) 


Proof. Following Diaconis |12] . let A G G. Thus 
\pfiA)-p2{A)\=\p{AxG)-p{GxA)\ 

= \p{{A X G) n A) + p{{A X G) n A^) 
-p{{G X A) n A) - p{{G X A) n A^)| 
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The first and third quantities in the absolute sign are equal. The second 
and fourth give a difference of two numbers, both smaller than //(A^) • 


5.3.2 Corollary: Coupling Inequality 


IfT is a coupling time for a random walk driven by n £ Mp{G), then for all 
k 


- vr|| < //(T > k) 


(5.6) 


Conversely there exists a coupling such that the inequality holds with equality. 


Proof. Let p, be the distribution of , 11). Then p has marginal distribu¬ 
tions and vr. Lemma 15.3.11 implies that 

IW*’" -M\ < = h{T > k) 

See [To] for a proof and discussion of the existence of an optimal coupling 
time • 


5.3.3 Example: A Walk on the n-Cube [25 ] 

Consider the walk on driven by the measure: 

r 1/2 if s = e 

Vn{s) '.= \ l/2n if s = Cj for some i (5-7) 

[ 0 otherwise. 

An equivalent formulation is that a coordinate is chosen independently from 
{1,... ,n} and a coin flip determines whether the coordinate is flipped or 
not. Consider the following coupling operator T. Suppose 
and coordinate j is chosen at random. If the coin is heads, then = 
+ (1 “ ®i)®i the jth coordinate of Llfc+i = (1 — aj). If the coin 
is tails, but the jih. coordinate of = aj. From the marginal 

viewpoint of C b is identical to sampling by Un. It remains to show that 
the coupling is suitably random (as described above). Suppose coordinate 
j is chosen. The distribution of each coordinate of 11^ is uniform on {0,1}. 
Suppose without loss of generality that the jth coordinate of f,k is 1. With 
equal probability the jih. coordinate of flfc+i will be 0 or 1 by the coin flip, 
hence the coupling operator is suitably random. Hence the coupling time 
is when all of the coordinates {1,... ,n} have been chosen. The bound on 
the coupon collector’s bound and the coupling inequality implies the walk 
is random after n log n steps. 
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Chapter 6 

Some New Heuristics 


6.1 The Random Walk as a Dynamical System 

Although the dynamics of a particle in a random walk are indeed ran¬ 
dom, the dynamics of its probability distribution certainly are not. In¬ 
deed note the probability distributions {i'*^}kGN evolve deterministically as 
{jepfc : A: E N}. Thus the random walk has the structure of a dynami¬ 
cal system {Mp{G),P} with fixed point attractor {vr}. The two canonical 
categories of dynamical systems (for which there is an existing literature of 
powerful methods e.g. m) are topological and measure preserving dynam¬ 
ical systems. Unfortunately at first remove {Mp{G),P} appears too coarse 
and structureless to apply any of these powerful methods. Also the map¬ 
ping function P is not necessarily invertible and this poses further problems. 
Indeed in many examples of walks exhibiting cut-off, P may be seen to be 
singular. Hence the assumption that needs to be made on P to put a struc¬ 
ture on {Mp{G),P} sufficient for application of dynamical systems methods 
to the cut-off phenomenon is overly strict. A more fundamental problem 
occurs in trying to put the structure of a measure preserving dynamical 
system on the walk in that if a meaningfuQ measure is put on Mp{G), the 
fact that {Mp[G))P^ —>■ {tt} would imply that P is in fact not measure 

k—^oo 

preserving. 

measure k wouldn’t be very meaningful if K,{Mp{G)) = K({7r}) 
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6.2 Charge Theory 


Two features of the ergodic random walk suggest an obvious generalisation. 
The first is that a stochastic operator conserves the unit weight of // G 
Mp(G). Suppose u G is a row vector of weight q in the positive orthant. 
A normalisation ensures u/q ^ Mp{G) hence uP/q has weight 1 and thus 
uP has weight q. A simple calculation shows that given any row vector 
u G RI^I of weight q, uP also has weight q. Therefore stochastic operators 
are weight preserving. This immediately implies that the left eigenvectors 
of an ergodic stochastic operator are of weight zero: uiP = XiUi (except ui 
of course). 

Secondly an ergodic stochastic operator converges to U = [l/IGI] (the 
matrix with all entries equal to 1/|G|), so that given a weight 1 row vector 
u, uP"’ converges to tt. In particular, if is distributed as any signed prob¬ 
ability measure (or charge: a signed measure on G such that p{G) = 1) p, 
the random walk will still converge to the random distribution. This allows 
an all manner of generalisations. For example, consider the signed stochas¬ 
tic operator Q = [p{ht~^)]th generated by a signed probability measure p. 
Under what conditions will 5'^Q'^ converge to the random distribution? 


6.3 Invertible Stochastic Operators 

In general a random walk need not start deterministically at e, but rather in 
an initial distribution p, = at6^. However pP"^ = at [6^P^). By right- 

invariance all the 5^P —)■ vr and hence pP^ —)■ tt for any initial distribution. 
In this sense there is a loss of information about initial conditions: the walk 
forgets where it began, where it was and is totally random. The dynamical 
systems community make distinctions between the behaviour of invertible 
and non-invertible maps, however this approach has not been exploited for 
the case of a random walk on a group. 
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It would be desirable to quantify the ‘folklore thesis’ that [23]: 

The loss of information about initial conditions, as the itera¬ 
tion process proceeds in a chaotic regime, is associated with the 
non-invertibility of the mapping function... Hence system mem¬ 
ory of initial conditions becomes blurred. 

Consider the case of a singular and symmetric stochastic operator P. 
The spectral theorem implies has a basis of (left) eigenvectors of P. 
Hence RI^I has an eigenspace decomposition ^ V), where V) := ker(At/ —P), 
where {At : t G G} are the eigenvalues of P (with the convention Ai = 1). 
Consider Mp{G) C R^^l = V). With a non-trivial kernel P, can ^destroy 

information^ and the naive reaction to this would be to consider G 
Vi © ker P such that — vrU « 1. Then ~ 7r|| =0 and there is 

cut-off. However given 5^ G © V), clearly P kills the ker P terms at the 
very first iterate, 5'^P, so this heuristic is incorrect. However in contrived 
examples the sampling could be done by vi until G Hi © ker P 2 but far 
from random then sampling by ^2 (or multiplying by P 2 ) would project onto 
Hi. See Section [6~il for more. 

6.3.1 Proposition 

A stochastic operator P is invertible if and only if the equation uP = tt has 
the unique solution u = n. 

If P is an invertible stochastic operator then the following hold: 

(i) If u is an eigenvector of P, then u is an eigenvector of P~^. In par¬ 
ticular, 7rP~^ = TT and P~^k = k for any constant function k G F{G). 

(a) //{At : t G G} are the eigenvalues of P, then {1/At : t £ G} are the 
eigenvalues of P~^. In particular, 1 is an eigenvalue of P~^, and all 
other eigenvalues of P~^ have modulus greater than 1. 

(in) The signed probability measures on G, Mi{G), are stable under P~^. 

(iv) For / G N, 5^P-^ G Mi{G)\Mp{G). 

Proof. If P is invertible uP = tt has unique solution. If P is singular then 
the kernel is non-trivial. Let ui ^ U 2 £ ker P be normalised such that 
Vi := TT -£ Ui £ Mp{G), then ViP = tt. 


(i) and (ii) are basic linear algebra facts. 
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(iii) From (i) the row and column sums of P ^ are 1. Thence let v G Mi{G); 

vP-^{G) = I 

seG \teG 



=1 


(iv) From (iii), 5^P~^ G Mi{G). Assume there exists v G Mp{G) such that 
vP = (5®. Now uP{s) = {f^Ps) must equal 5^{s) where ps is the row 
vector equal to the s-column of P. By Cauchy-Schwarz; 

\W,Ps)\ < llz^lbllpslb < l|i^l|i||ps||i (6.1) 

Because 

uP{e) = {u,Pe) = 1 = ||j^||l||Pe||l 

the second and third inequalities are equalities for s = e. The first 
equality implies that v and pe are linearly dependent, = kpe- As 
probability measures must have weight 1, this implies u = pe. The 
second equality implies that i' and pe are Dirac measures. Hence u is 
a Dirac measure, say 5^, and thus P is not ergodic (as S is a subset of 
the coset {e}(/, of the proper normal subgroup {e}). Inductively given 
V G Mi{G)\Mp{G), there does not exist v G Mp{G) such that I'P = v 
as V must have negative entries but both i' and P are positive • 
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6.4 Convolution Factorisations of tt 


Take a deck of cards and transpose the top card with a random card. Next 
transpose the second card with a random card (at or underneath the second) 
and continue inductively until all but the second from bottom card has been 
transposed. Apply the same shuffle to the 51st card ((51,51) or (51,52)). 
The first card is random, the second is random and inductively all the cards 
are random. Hence considering the group Sn and the measures Ui uniform 
on the transpositions + 1 ),..., (i, n)} the random distribution 

factorises as: 

TT = * • • • * 2^2 * (6-2) 

Urban m considers the question: given a group G and a symmetric set of 
generators S, does there exist a finite number of convolutions of symmet¬ 
ric measures {vi G Mp{G) : i = 1,... ,m} supported on S such that (16.2p 
holds (with m rather than n — 1 terms)? Urban uses Diaconis-Fourier the¬ 
ory (particularly Lemma l3.2.3p to show that if, at a non-trivial irreducible 
representation of G, p, the Fourier transform of t'm* - • -* 1^1 is non-zero then 
(j6.2p cannot hold. Briefly, Lemma 13.2.31 states that at any non-trivial irre¬ 
ducible representation, ^(p) = 0; and the Fourier transform of Vm * • • • * 1^1 
is easily computed via the convolution theorem. 

If = TT for some finite A; G N then the results of Section o shows 
that ly = TT. In particular, as ly is symmetric, P has an eigenbasis, and 1 
is an eigenvalue of P with multiplicity 1. Suppose for contradiction that 

= vr for some /c G N, but v ^ tt. Suppose <5® G Ui 0 ker P; then = vr. 
However = v -k 5^, however v -k 5^ = v and thus v = tt. Hence at least 
one of the eigenvectors in the eigenbasis expansion of 5^ is associated with 
a non-zero eigenvalue. Thus hence 7 ^ tt for any A: G N. Note that each 
of the Vi induces a stochastic operator Pi and (16.2p is equivalent to 


U = PmPm-i ■ ■ ■ P 2 P 1 (6.3) 

Note that U is singular. If each of the Pi are invertible then so is U, a 
contradiction. Therefore (j6.3p cannot be true if each of the Pi are invertible. 
Theorem 6 on page 49 of Diaconis |12] implies that each eigenvalue of T(p), 
where p is an irreducible representation, is an eigenvalue of multiplicity dp. 
In the case of an Abelian group, the eigenvalues of P are simply given by 
{v{pi) : Pi irreducible} and the analysis breaks down to that of Urban’s as 
v{pi) 7 ^ 0 is equivalent to 0 is not an eigenvalue of P; i.e. P is invertible. 
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Example: Simple Walk on the Circle 


Let n be odd and consider the set Ai of not-necessarily symmetric measures 
with support S = {±1} (i.e. Ai = {up € Mp{G) : ^'p(l) = p, t'p(—1) = 
l—p\p G (0,1)}). Does TT admit a finite convolution factorisation of measures 
from Ai7 For convenience denote q := 1 — p and a := p/q. Consider the 
stochastic operator associated to Vp-. 


P„ = 


/ 0 p 0 0 

q 0 p 0 
0 q 0 p 

\ p 0 0 0 


0 q\ 
0 0 
0 0 

q 0 J 


Apply the elementary row operation —)• ri/q to each row and permute the 

rows by (r„r„_ir „_2 • • • ri): 


/ 1 0 a 0 

0 10a 


Pp^ 


a 

\ 0 


0 0 
a 0 


0 

0 


0 0 \ 
0 0 


1 0 
0 l) 


NovH eliminate by Vn-i —^ — ari and Vn ^ Vn — ar 2 - 


/10a 0 

0 10 a 

0 0 -a^ 0 

V 0 0 0 -a^ 


0 0 \ 
0 0 


1 0 
0 ij 


Now suppose n = 

2m + 1 

and continue inductively until: 



/ 1 

0 

0 

0 

\ 

Pv = 






p 

0 

(-l)”^+ia”^ 

1 

0 



^0 ••• 

0 


1 

/ 


p < q then a < 1 and Gershgorin’s Theorem implies that Pp is invertible. If p > 5 , 
then a > 1 and elementary row operations give Pp invertible similarly. Gershgorin cannot 
deal with the case p = q however. Gershgorin can show Pp is invertible with n even when 
p ^ q, but on this support, the walk is not ergodic. 


68 











A final application of r^-i —>• r„_i — and Tn ^ rn — 

(—yields: 

/I 0 0 \ 

Pp = 

0 1 (-l)™+2a™+i 

Vo •••0 1 / 

Hence the Pp have n pivots and are thus invertible so a finite convolution of 
measures from Ai is never random. 

Urban proves a stronger result using the Diaconis-Fourier theory; namely 
if Ad is a set of measures symmetric on {s E : |s| < n/4} then there is 
no TT-factorisation. A quick look at the representation theory of shows 
that the Fourier transform of these measures is bounded away from 0 and 
hence so are the eigenvalues. 

Example: Urban’s Transposition Shnffle 

Consider the convolution described by at the start of this section. The final 
driving measure t'n-i = ((5® + generates a singular stochastic 

operator Pn-i by Proposition 16.3.11 (v) and a slight rearrangement shows 
that all of the i>i generate singular stochastic operators. 

Open Problem 

This leads onto the interesting question: 

For what measures v E Mp{G) is the associated stochastic oper¬ 
ator invertible? 

A sufficient condition for invertibility guaranteed by Gershgorin’s circle the¬ 
orem is that i'{e) > 1/2. 
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6.5 Geometry of the \\iy*^ - 7r|| Graph 

Consider an invertible symmetric ergodic stochastic operator P. Due to the 
fact that the eigenvalues of P~^ (except 1) are all modulus greater than 1, 
the sequence ~ "^11 is monotonically increasing to inhnity as /c —)• oo. 

Hence the graph looks something like: 

IIV* ^-TTii 

. 6 - 

5 - 

* 4 - 

. 3 ^ 

• 2 - 





Figure 6.1: As /c —)• —oo, leaves Mp{G) and becomes a ‘big’ signed 

measure. 


The assumption could be made that in this case the graph must be 
‘concave up’ and similarly to g{x) in Figure 14.31 does not exhibit cut-off. 
Suppose an invertible stochastic operator did show cut-off: 

Instead one might think that somehow the dashed line behaviour is nec¬ 
essary for cut-off to hold — and of course this behaviour cannot hold when 
P is invertible. This leads to the conjecture: P invertible implies no cut-off. 
However, in general, P~'^{{5^}) is non-empty, and if a representative Um 
from this set is chosen the graph of \\umP^~^^ ~ "^11 exhibit the ‘non- 
dashed line’ behaviour. Note that for the random walk on the cube with 
loops there is no charge that is sent to 5® by P. This leads onto another 
interesting question: 

Open Problem 

For what singular stochastic operators P generated by u ^ Mp{G) 
does there exist a charge u G Mi [G) sueh that uP = 5^? 
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IIV* ^-TTII 



Figure 6.2: One could conjecture that the non-dashed line behaviour, sup¬ 
posedly corresponding to an invertible stochastic operator, with two ‘turning 
points’ be impossible. 


Unfortunately the stochastic operator for the simple walk with loops on 
Z2 with n even is invertible. If true the conjecture would have placed the 
problem in a very precarious position. Suppose (G„, t'n) is a family exhibit¬ 
ing the cut-off phenomenon (so that the stochastic operator is singular), 
such that e G supp(t'„) = Let e G (0,1/2), and transform the Vn as: 




^ £ if s = e 

if s G S„\{e} 


(6.4) 


Then by Gershgorin’s circle theorem P' would be invertible and hence two 
random walks with the same support need not exhibit the same behavior: 
the condition for cut-off to hold would not be on the support only. Unfortu¬ 
nately for those active in the field one would assume the condition is indeed 
this complex. 
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Chapter 7 

Appendix 


7.1 Proof of Lemma 13.4.1 

1. Claim: 




(i'^\ 

cos j- 

= 

cos t — 

\ nJ 


\ re/ 


for any I G [j]. 


(7.1) 


Suppose j = I mod re, where ^ G {0,1, ..., re — 1}, so that j = I + mn 
for some m G Z. Then 

cos (^j—^ = cos ^(/ + mn)—^ 

' Itt 


= cos 


+ rrevr 


re 


Itt Itt 

= cos — cos rriTT — sm — sm rrevr 
re ^ =0 

= (-l)™cos — 
re 


Now let at = cos^irt/n) and bt = cos(27rt/re), and note that for t = 
l,2 ,...,(re-l)/2: 


, , [ \^{n+t)/2\ 

and | 6 (n-t)/ 2 | if t odd 

= \ 


[ h/2\ 

and |^n-i/ 2 | if t even 

Hence as (x)^ = xp: 


n—1 

(n-l)/2 

cos^^(27rf/re) 

= 2 cos^^(7rt/re) • 


t=i t=i 
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2. Let h{x) = log ^e^^/^cosxj; so that h'{x) = x — tanx and h"{x) = 

1 — sec^ X. Thus h''{x) < 0 on [0, 7 r/ 2 ] and so with h'{0) = 0, h{x) is a 
decreasing function in x. In particular, h{x) < h{0) = 0 and as log is 
an increasing function, e* /^cosx < 1, for x E [ 0 , 7 r/ 2 ] • 

3. In the first instance: 


j=0 


1 


1 - 


is a convergent geometric series when x > 0. Now 


OO 


j=0 


'Y 

i=i 


Also — 1 > 3(j — 1) for each j E No- Hence, as is increasing, for 
all j E No, and so 


j=i 


1 )* < Y^ 

j=i 


E 


4. Taking the approach of [7], let h{x) = log cosxj ; 

h{0) = 0 

h'(x) = X + x^ — tanxl „ = 0 
^ ^ la:=0 

h"(x) = 3x^ — tan^ x| n = 0 
^ ^ lx=0 

h'"(x) = 6x — 2 sec^ X tan x\ n = 0 
^ ^ lx=0 

/i*^ (x) = 6 + 4 sec^ X — 6 sec^ x 

This is a quadratic in sec^ x which is positive when | sec x| < 1 + \/l0/3. 

This translates into better than x E [0, vr/G] • 
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7.2 Proof of Lemma 13.5.1 


1. In the first instance: 


1 - 


2(n + l-0 
n + 1 


= 1-2 + 


21 


n + 1 


= - 1 - 


21 


So that 


Secondly, 


1 - 


21 


n + 1 


2k 


n 


= 1 - 


n\ 


2{n + l-l) 
n + 1 


n + 1 


2k 


n\ 


Ij \n + l — lj l\{n — l\) {n + 1 — ly.ijff—1 — l))\ 


That is, if I < n/2, 


n\ 


1 


1 


(f-l)!(n-/)! [I n + l-l 
n\ \n + l — I — I 


[/(n + 1-0. 


n! 


n + 1 — 2Z 


{I - l)!(n - /)! [Z(n + 1 - /)J i<n /2 


> 0 


> 


n 

n + 1 — Z 


2. By dehnition. 


a! 


\hj h\{a-b)\ 
3. It suffices to show 


o(a — l)(a — 2) • • • (a — 5 + 1) ^ 


bl 


b\ 


f{j) := log 1 - 


2i 


n + 1 


2k 


< -j log n - jc =: g{j) (7.3) 


as exp is an increasing function. Now writing Zc = (n + l)(logn + c). 


/(I) = -(n + l)(logn + c) log ( 1- 
g[l) = _(c + logn) 


2j 


n + 1 


, and 
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Now c = Ak /(n + 1) — logn so c + log n = Ak /(n + 1). Therefore 


/(I)-5(1) = 


Ak 


+ 1 

This is negative (/(I) < 5(1)) if 


1 f 71 — 1 
1 + -(n + l)log ( —— 1 < 0 

2 V n + 1 


-t 1 / f n — l 


n - 1 

log-< — 


n + 1 
^ h{n) = log 


n + 1 
n + 1 


n — l 


Now h{2) = log 3 — 1 > 0 and 


lim 

n^oo 


log 


n + 1 


n — l 

Differentiating with respect to n, 


n + 1 


n + 1 


> 0 


= 0 . 


h'{n) = — 


+ 


n^ — 1 (l + n )2 (n + l) 2 (n — 1 ) 


< 0 . 


Hence h{n) is monotone decreasing from h(2) > 0 to 0 so is positive. 
Hence /(I) < 5 ( 1 ). Now differentiating with respect to j, 


f'U) = - 


(n + l)(c + log n) 
n + 1 — 2 j 


Ak 


< 0 


n+l-2j j<n /2 


Also 


a'ij) = -c- logn = - 


Ak 

n + 1 


Finally as j > 0, f{j) < g'{j), for all j < n/2 
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